1

Topic: 2D-Linq and optimization of numeral filters

This post, in an amicable way, deserves a place in flame.comp since incentive motive initially was the question alex_public, set in flame bowels in that forum: But I can throw for a change one more  for understanding. The elementary, but very necessary problem in many areas: we have a two-dimensional array of numbers and it is necessary to receive from it a new array in which each point is an average from the neighbors (4-yoh or 8, not an essence). What about it tells linq?) )) (https://rsdn.org/forum/flame.comp/7137817.1 the Author: alex_public Date: 07.05 20:18) But as it is a question of limits  the code on C#, I nevertheless  it here - in those interests who is excited too with a question on productivity of the "mathematical" code on , and to re-read topics depth for one hundred posts of patience is not present. So, what about it tells Linq? The question, clear business, with  - type, yes Linq is a poor superstructure over IEnumerable which then and snivels fastened to RDBMS. And here at us - in essence other material,  and, unlike, say, graphs, to listings irreducible. Personally my lips the following tells about it Linq: int [] FourNeighborAverage (int [] data) => from d in data select (d [-1, 0] + d [1, 0] + d [0, 1] + d [0,-1]) / 4; How it works? Well, to begin with at us is public static class Array2d where it is defined here such here extension a method: public static T [] Select <T> (this T [] source, Kernel <T> kernel) That is expression d => (d [-1, 0] + d [1, 0] + d [0, 1] + d [0,-1]) / 4 is treated as Kernel <int>: public delegate T Kernel <T> (IRelativeAccess2d <T> point); In turn, interface IRelativeAccess2d <T> is defined rather simply: public interface IRelativeAccess2d <T> {T this [int dx, int dy] {get;} } Now all becomes clear: in ours Kernel the interface for access to "adjacent" concerning leaking is told to array cells that allows us to describe type expressions "an average from the four neighbors". To implement our method Select it is not so difficult. Naive implementation looks so: public static T [] Select <T> (this T [] source, Kernel <T> kernel) {var h = source. GetLength (0); var w = source. GetLength (1); var result = new T [h, w]; for (int i = 0; i <h; i ++) for (int j = 0; j <w; j ++) result [i, j] = kernel (new RelativeArrayAccess2d <T> (source, i, j)); return result;} Structure RelativeArrayAccess2d <T> is too trivial, that it to result - it stores the link to an array source and coordinates of a current element; at reversal on this [dx, dy] we add the relative coordinates to leaking and we return an element. This code it is already enough to implement array copying in style linq: from d in data select d [0, 0]; Unfortunately, the first start with an averaging kernel takes off with ArrayIndexOutOfBoundsException. Logically - we try to address for limits of an initial array. Severe programmers on pluses on such trifles of attention do not turn - well, or demand from the programmer of manual instructions of the sizes of "border" in which the filter does not work. The correct decision of this task - is difficult enough: we should to learn file the filter on the move so that it correctly worked on array boundaries (for an element near to rectangle "side" not 4, but 3 neighbors, and at an angular element - only 2. Strictly speaking, they also need to be averaged). But meanwhile we manage the decision in style With ++ - restriction of output area of the filter. To make it it is not so difficult - we simply once cause kernel with special implementation IRelativeAccess which remembers where there is a reversal: private sealed class KernelMeasure <T>: IRelativeAccess2d <T> {... T IRelativeAccess2d <T>.this [int x, int y] {get {xmin = Math. Min (xmin, x); xmax = Math. Max (xmax, x); ymin = Math. Min (ymin, y); ymax = Math. Max (ymax, y); return default (T);} }} Generally speaking, it will work not for all types Kernel - for example, we could use Math. Random () for an offset choice. But for almost interesting cases when offsets are always constant, this approach works. Total, at us more or less efficient code which allows to write down filtration formulas conveniently turns out: public static T [] Select <T> (this T [] source, Kernel <T> kernel) {var h = source. GetLength (0); var w = source. GetLength (1); var km = new KernelMeasure <T> (); kernel (km);//led measurements var t = 0 - km.xmin; var b = h - km.xmax; var l = 0 - km.ymin; var r = w - km.ymax; var result = new T [h, w]; for (int i = t; i <b; i ++) for (int j = l; j <r; j ++) result [i, j] = kernel (new RelativeArrayAccess2d <T> (source, i, j)); return result;} On it it would be possible and to stop, if not the malicious comment in the same flame on 30 levels above the Author: alex_public Date: 28.03 17:37: Well as, if we tell about operation with collections (instead of about a DBMS), the normal imperative code (type for ()...) Anywhere did not get to. And thus it on former is as the most generalized, and the most productive. CHALLENGE ACCEPTED! Apropos "" to argue sense is not present - obviously that resulted from d in data select... It is possible to force to work without changes not only with a 2d-file, but also with the arbitrary structure which is a two-dimensional grid, indexed in integer coordinates. For example, we can work with "image" from double with the sizes one million on one million, that is having 10^12 points or nearby 7 terabyte. Search by their cycle for rests at all against time, and in the size of storage and a disk of the local machine. So we saw all on , we scatter on several thousand servers, we send the code for execution, we wait, . Well, that is it is received again  monstrous "canvas" for the further operations - for example, edge-detect, and then still convolution, and then dimensionality reduction, and already in the end we receive something comprehensible to storage of the local machine. And all - in compact and convenient for reading (and a spelling) a type. A question at us to that point, where about "most productive". Let's write that ideal imperative code which would be written on C# by the programmer who is not trusting to all  to modern links, from them "dynamics and a reflection": static int [] PerfectFourNeighborAverage (this int [] source) {var h = source. GetLength (0); var w = source. GetLength (1); var result = new int [h, w]; for (int i = 1; i <h-1; i ++) for (int j = 1; j <w-1; j ++) result [i, j] = (source [i, j-1] + source [i, j+1] + source [i-1, j] + source [i+1, j])/4; return result;} We take a casual array in the size 1000*1000, and we try to compare productivity of our "generalized" code to "specific": BenchmarkDotNet=v0.10.14, OS=Windows 10.0.15063.1088 (1703/CreatorsUpdate/Redstone2) Intel Core i7-6600U CPU 2.60GHz (Skylake), 1 CPU, 4 logical and 2 physical cores Frequency=2742190 Hz, Resolution=364.6720 ns, Timer=TSC [Host]: a. net Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2650.0 DefaultJob: the. net Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2650.0 Method | Mean | Error | StdDev | Scaled | ScaledSD |----------- |---------- neutral---------- neutral---------- neutral------- neutral--------- neutral Perfect | 15.008 ms | 0.1706 ms | 0.1512 ms | 1.00 | 0.00 | Linq | 33.423 ms | 0.6599 ms | 0.8345 ms | 2.23 | 0.06 | Well, at first sight, linq consulted not too badly - only at 2.23 time more slowly. Nevertheless, this insignificant price for convenient abstraction can easily frighten off fans . Therefore in the following series I will try to show that it is possible to make with it, changing nothing in the code . That is those advantages which are free for the application programmer - from it it is required unless to recompile the project. Any  in the code - neither imperative development of cycles, nor a manual vectorization, even skillful arrangement "declarative" . Only a hardcore, only bare from d in data select (d [-1, 0] + d [1, 0] + d [0, 1] + d [0,-1]) / 4.

2

Re: 2D-Linq and optimization of numeral filters

Hello, Sinclair, you wrote: If you start to be engaged , it is necessary to chase here with this piece http://halide-lang.org/ All remaining not seriously. Basically this piece can and on C# be stretched with the almost same syntax. And the computational model too allows to scatter calculations on a cluster.... <<RSDN@Home 1.0.0 alpha 5 rev. 0>>

3

Re: 2D-Linq and optimization of numeral filters

Hello, Sinclair, you wrote: S> it is necessary to receive from it a new array in which each point is an average from the neighbors (4-yoh or 8, not an essence). What about it tells linq?))) What for to dress a sock through a head? S> therefore in the following series I will try to show that it is possible to make with it I Hope that then it becomes clear what for. Right now I in doubts. Probably speaking linq you understand operation with IEnumerable <T> which is not favourable for loading entirely in the form of an array? It is possible to use the data type supporting virtualization, c public indexer and any  inside, in this case storage will not be overflowed and PerfectFourNeighborAverage () becomes the normal decision.

4

Re: 2D-Linq and optimization of numeral filters

Hello, Sinatr, you wrote: S> What for to dress a sock through a head? Apprx. your sentences for the task decision (on C#)? S>> Therefore in the following series I will try to show that it is possible to make with it S> I Hope that then it becomes clear what for. Right now I in doubts. Meanwhile - the task to leave from code PerfectFourNeighborAverage (), in which too many superfluous details which was not in a statement of the initial task. It does its (obviously badly read (for example, whether easily to find out one-off errors?) And (disputably badly optimized. S> probably speaking linq you understand operation with IEnumerable <T> which is not favourable for loading entirely in the form of an array? int [] does not implement IEnumerable <T>. S> It is possible to use the data type supporting virtualization, c public indexer and any  inside, in this case storage will not be overflowed and PerfectFourNeighborAverage () becomes the normal decision. Very much hardly PerfectFourNeighborAverage () becomes the normal decision for  an array even if to use in any .

5

Re: 2D-Linq and optimization of numeral filters

Hello, Sinclair, you wrote: S>> What for to dress a sock through a head? S> apprx. your sentences for the task decision (on C#)? PerfectFourNeighborAverage () or the similar decision which does not use linq from a word generally. S> It does its (obviously badly read (for example, whether easily to find out one-off errors?) And (disputably badly optimized. Any difficult enough algorithms  and it is normal. To use linq that it is easier to understand algorithm or not to use, that it was optimal? S> very much hardly PerfectFourNeighborAverage () becomes the normal decision for  an array even if to use in any . You meant the size of a database or "array"? Probably you did not understand me, I suggested to use other data type + your method with usage indexer (indexer! = array). And if speech about the size not very well what algorithm, all the same is necessary all given to process, all of 7 terabyte.

6

Re: 2D-Linq and optimization of numeral filters

Hello, Sinatr, you wrote: S> PerfectFourNeighborAverage () or the similar decision which does not use linq from a word generally. It is the worst possible variant. It it is expensive both to write, and to execute. S> any difficult enough algorithms  and it is normal. To use linq that it is easier to understand algorithm or not to use, that it was optimal? First of all the algorithm should be clear. Because the clear algorithm can be made correct. Us algorithms which quickly produce incorrect result essentially do not interest. S>> very much hardly PerfectFourNeighborAverage () becomes the normal decision for  an array even if to use in any . S> you meant the size of a database or "array"? I meant "data array". S> it is possible you me did not understand, I suggested to use other data type + your method with usage indexer (indexer! = array). And if speech about the size not very well what algorithm, all the same is necessary all given to process, all of 7 terabyte. Well, here it is just important, what algorithm. For example, whether it is possible to process the data in some flows. Or, better, by several machines. Whether it is possible to vectorize algorithm. The algorithm with explicit  for for extremely difficultly gives in to any transformations - very clever compiler is necessary to us to recover an intention of the programmer, and to implement it in a new fashion instead of how it is written.

7

Re: 2D-Linq and optimization of numeral filters

Hello, Sinclair, you wrote: <skipped> I do not understand one - what for? You spent (tell itself, how many) time, for proving that this elementary filter to implement by means of LinQ  it is possible, writing as a result of 2 pages of the text with the delegate, the interface, kernel etc. there where in imperative style the elementary double cycle with one operator inside is written. It is written "on the automatic machine", without thinking almost as similar any skilled programmer wrote many times. And result - a procorf of speed in 2 with superfluous time. Also it is all thus that it is necessary to change seriously the task, and you will invent again the decision, and it is good, if invent. What for?

8

Re: 2D-Linq and optimization of numeral filters

Hello, Pavel Dvorkin, you wrote: PD> Also it is all thus that it is necessary to change seriously the task, and you will invent again the decision, and it is good, if invent. All very much the other way. It is necessary to change the task - we tell, to increase gauss-bljura radius in the filter - and the code written by the programmer "without thinking almost", it appears out of work. And if it suddenly wastes time on optimization of it "" the code all becomes even worse: in the same place the code amount already time in 10 grew - these all here loop unrolling, SIMD intrincics, same all is not simple so. Now to understand, where to thrust additional coefficients, extremely heavily. And still it is necessary not to forget to change limits in cycles, etc., etc. PD> What for? To disaccustom people to be afraid performance penalty.