1

Topic: Sentences on classical and parallel

I offer all who becomes interested, to consider some main ideas of the classical and parallel programming based on procedures/functions with planning of repeated input (PPPV/FPPV). In a minimum variant is a procedure or function which has a static or dynamic plan of execution. The plan is, generally speaking, a deque consisting of elements-structures, each of which has the fields similar on type and a name to parameters of appropriate function/procedure. The plan can be replenished as from the outside (to a procedure/function call, by means of some simple receptions), and from within (it is the basic approach) by means of special calls of a location in the beginning plan_first (...) or in the end plan_last (...) The plan. PPPV/FPPV it is executed sequentially or parallely, according to the plan. In a serial mode it is caused anew memberwise the plan (the element is by default derived from the plan beginning) and fulfills it completely (before natural end PPPV/FPPV or to return). As it has already been told, in the course of execution of any of stages, new stages which will be executed PPPV/FPPV further can be interposed into the plan. For support of a parallel mode (in the elementary case) the special markers-barriers dividing the plan for groups can be interposed into the plan besides stages. By means of plan_group_parallelize directive it is possible to include parallel execution leaking (allocated in the plan beginning) groups, thus it is considered as group of tasks (task pool) from which processors/kernels type to themselves on execution stages-problems. By means of plan_group_vectorize directive it is possible to send group of tasks on the vectorial calculator, such as the multinuclear videocard (truth, thus in program design there can be some singularities - for example, it can be demanded is explicit to mark what of program units are intended only for the vectorial calculator what - only   what - both for  and for the vectorial device). Already such basic approach gives, at least: one more method of programming of many tasks using a stack, a deque, queue and, sometimes even an array. One more approach to programming of parallel handling for SMP-systems and vectorial calculators (multinuclear graphic cards). Not to complicate understanding, at once I will result pair of examples. Parallel summation of elements in a tree. reenterable [ARR_SIZE] TreeSum (TreeNode * Cur, reduction (+) DataItem and SumResult) {if (Cur == Root) plan_group_parallelize; if (Cur-> Left) plan_last (Cur-> Left, SumResult); if (Cur-> Right) plan_last (Cur-> Right, SumResult); SumResult = Cur-> Data;} Wave algorithm of Lie: way search in a maze. const int NL = 10;/* the Size of a maze */const unsigned char W = 0xFF;/* the Wall *//* the Maze */unsigned char Labirint [NL] [NL] = {{W, W, W, W, W, W, W, W, W, W}, {W, 0,0,0,0,0,0,0,0, W}, {W, 0, W, W, 0,0, W, 0,0, W}, {W, 0,0, W, 0,0, W, 0,0, W}, {W, 0,0, W, W, 0, W, 0,0, W}, {W, 0,0,0, W, W, W, 0,0, W}, {W, 0,0,0,0,0,0,0,0, W}, {W, 0,0,0,0,0,0,0,0, W}, {W, 0,0,0,0,0,0,0,0, W}, {W, W, W, W, W, W, W, W, W, W},};/* Increments for shift concerning a current cell to the left, upwards, downwards, to the right */signed char OffsX [4] = {-1,0,0, +1}; signed char OffsY [4] = {0,-1, +1,0}; const char FirstX = 8;/* the Starting point */const char FirstY = 8; const char LastX = 5;/* an assignment Point */const chain [NL*NL] FindLi (unsigned char X, unsigned char Y, int Num) throw (unsigned char X, unsigned char Y) {char Found = 0; for (int i=0;! Found && i <4; i ++) {/* we View cells nearby */unsigned char X1 = X+OffsX [i]; unsigned char Y1 = Y+OffsY [i]; if (X1> =0 && X1 <NL && Y1> =0 && Y1 <NL && Labirint [Y1] [X1] == 0) {/* If the cell is not enumerated yet */Labirint [Y1] [X1] = Num;/* It is numbered */if (X1 == LastX && Y1 == LastY)/* If last */Found = 1;/* It is signaled the termination */else/* If not last */plan_last (X1, Y1, Num+1);/* It is placed in queue */}} if (Found) {clear_plan;/* we Clear the plan of review of cells */X = LastX; Y = LastY; throw_last (X, Y);/* It is placed in "stack" an assignment point (last) */while (X! =FirstX || Y! =FirstY) {/* departures */char PointFound = 0 were not at the end of the resources Yet;/* Search of the following cell of a way */for (int i=0;! PointFound && i <4; i ++) {unsigned char X1 = X+OffsX [i]; unsigned char Y1 = Y+OffsY [i];/* the Candidate on the following cell of a way */if (X1> =0 && X1 <NL && Y1> =0 && Y1 <NL && Labirint [Y1] [X1] && Labirint [Y1] [X1] <Labirint [Y] [X]) {/* If the cell is not empty and has smaller number *//* At cells of walls the greatest numbers, to reviewing do not get */PointFound = 1; throw_first (X1, Y1);/* It is added in a way (a stack) the found cell */X = X1; Y = Y1;/* On a following step we will start with this cell */}}}} else if (plan_empty) cout <<"NO PATH\n";/* did not reach destination */} chain [NL*2] OutLi (unsigned char X, unsigned char Y) {cout <<"("<<(int) Y <<","<<(int) X <<")";} int main () {cout <<"Find the path in the simple labirint (Li algorithm):\n"; Labirint [FirstY] [FirstX] = 1; plan_chain (0, FindLi (FirstX, FirstY, 2), OutLi (0,0));/* the Pipeline from two procedures */cout <<"\n"; return 0;} And one more abstract example of operation with  the videocard. I will risk to result while without explanations. Probably, it will be interesting to somebody to guess, as it works. #pragma plan vectorized #include <iostream> using namespace std; #pragma plan common begin #define N 5 #define threads 100 #pragma plan common end #pragma plan gpu begin #define addition 0.01 #pragma plan gpu end float MUL = 3.14; float * _OUT = NULL; reenterable void proc (bool init, int k, _global (1) float * mul, _global (threads) int * sizes, int n, _local (__ planned __. n) float * out) {int i; if (init) {for (i = 0; i <threads; i ++) {plan_last (false, i, mul, sizes, sizes [i], out); out + = sizes [i];} plan_group_vectorize (NULL);} else for (i = 0; i <n; i ++) {*out = k * (*mul); #ifdef __ GPU __ *out + = addition; #endif out ++;}} int main () {int * sizes = new int [threads]; int NN = 0; for (int i = 0; i <threads; i ++) {sizes [i] = 1 + i % 2; NN + = sizes [i];} _OUT = new float [NN]; cout <<"MAX group size =" <<vector_max_size (NULL) <<endl; proc (true, 0, &MUL, sizes, 0, _OUT); for (int i = 0; i <NN; i ++) cout <<_OUT [i] <<""; cout <<endl; delete [] _OUT; delete [] sizes; return 0;} Now, I will mark that if to consider PPPV/FPPV as a certain elementary node of computing topology (column) and  the constructions allowing one PPPV/FPPV to replenish plan another (adjacent under the column) PPPV/FPPV it is possible  to work with difficult enough computing topology, and both in case of the general, and in case of partite storage (for example, on a cluster - there transmission of elements of the plan on the column will be fulfilled by means of simple operations of transmission on a network). Operations of addition of the plan another PPPV/FPPV are called throw_first (...) and throw_last (...) . Their parameters are defined by call arguments appropriate accepting PPPV/FPPV. If any PPPV/FPPV has only one neighbor in topology (for example, in the pipeline), parameters the most normal. If it is some neighbors, one of parameters becomes special - address. All with the same name (corresponding PPPV/FPPV) nodes of the graph-topology are numbered by one, therefore address parameter includes the name accepting PPPV/FPPV for which in square brackets goes an index. Topology while is offered to be described or statically (special constructions for a vector/conveyor or lists of arcs for the arbitrary graph) or  - when the list of arcs is generated special (generating the code) by insertions - macrounits (can be, for example, are written on similar with PHP ideology - insertions generate a fragment of the text of the program, it is possible to use any language depending on tasks, though PHP, though GNU Prolog). Technically possibility normal dynamic (by calls of functions) topology results is not eliminated also. For high-grade operation channels / lazy variables, transactional storage, barriers in addition can always be used, semaphores, etc. I Will result some examples with various topology. Calculation on the pipeline of minimum and maximum elements of a tree. chain [ARR_SIZE] TreeMax (TreeNode * Cur, reduction (max) DataItem and MaxResult) {static DataItem DummyMin; throw_last (Cur, DummyMin); if (Cur == Root) plan_group_parallelize; if (Cur-> Left) plan_last (Cur-> Left, MaxResult); if (Cur-> Right) plan_last (Cur-> Right, MaxResult); MaxResult = Cur-> Data;} chain [ARR_SIZE] TreeMin (TreeNode * Cur, reduction (min) DataItem and MinResult) {if (Cur == Root) plan_group_parallelize; MinResult = Cur-> Data;} void TreeMinMax (DataItem and Min, DataItem and Max) {Max.fill (0.0); Min.fill (1000.0); plan_parallel_chain (0, TreeMax (Root, Max), TreeMin (Root, Min));} the Example with topology "a needle with an ear". to be applied to testing of productivity of specific implementation #include <iostream> using namespace std; bool stop; chain A (bool init) throw (bool init, int N) {stop = false; throw_first (false, 1); Sleep (2000); stop = true;} chain B (bool init, int N) throw (bool init, bool _stop, int N) {if (! init) {if (stop) throw_first (false, true, N); else throw_first (false, false, N+1);}} chain a C (bool init, bool _stop, int N) throw (bool init, int N) {if (! init) {if (_stop) {cout <<N; plan_topology_quit ();} else throw_first (false, N+1);} } int main () {plan_topology {plan_parallel_chain (A (true)-> B (true, 0)-> a C (true, false, 0)); plan_parallel_reverse (a C (true, false, 0)-> B (true, 0));}/3; return 0;} Everything that is described above, implemented (there is the simple translator implementing besides the above-stated still some interesting things).

2

Re: Sentences on classical and parallel

Hello, VP __, you wrote: VP _>) one more method of programming of many tasks using a stack, a deque, queue and, sometimes even an array. What is "deque"? VP _>) one more approach to programming of parallel handling for SMP-systems and vectorial calculators (multinuclear VP _> graphic cards). Not so clearly why the method "plan creations" executions facilitates code writing? VP _> not to complicate understanding, at once I will result pair of examples. Than you erlang does not arrange? All your examples describe as how to do. For  the code to have knowledge is better that it is necessary to receive (instead of as) and what , types  their representations  it is possible to use and what conversions over them are admissible also what  is. And the compiler to puzzle with a combination of algorithms, receptions and  for synthesis of the code which will consider finite architecture and possible methods  calculations.

3

Re: Sentences on classical and parallel

Hello, kov_serg. VP _>>) one more method of programming of many tasks using a stack, a deque, queue and, sometimes even an array. _> that such "deque"? It is the linear structure, it is possible to place and derive elements both in/from the beginnings, and in/from the end. VP _>> one more approach to programming of parallel handling for SMP-systems and vectorial calculators (multinuclear VP _>> graphic cards). _> it is not so clear why the method "plan creations" executions facilitates code writing? It is possible to get rid of explicit input of a variable (for example, queues). The code will be more compact and purer. In the same way normal procedures/functions relieve us of necessity of the explicit declaration of a stack. If the algorithm is written down with application PPPV/FPPV than it is in certain cases a bit easier . For example, if operation with a tree or a network is programmed. VP _>> not to complicate understanding, at once I will result pair of examples. _> than you erlang does not arrange? There is also Hell, Silk, Haskell and many other things. All of them are in own way quite good. Here additional/alternative means of multisequencing are offered. _> all your examples describe as how to do. For  the code to have knowledge is better that it is necessary to receive (instead of as) _> and what , types  their representations  it is possible to use and what conversions over them are admissible also what  is. _> and the compiler to puzzle with a combination of algorithms, receptions and  for synthesis of the code which will consider _> finite architecture and possible methods  calculations. The modern compiler not in a state completely and effectively  any code. It is the intellectual task. Compilers well cope with simple cases, and special systems of multisequencing by means of the partial trace can help still in some cases, but whether is always optimal? Therefore, it is necessary to work sometimes manually. One-two directives of multisequencing - not the most bad variant

4

Re: Sentences on classical and parallel

Hello, VP __, you wrote: VP _> Hello, kov_serg. VP _> It is the linear structure, it is possible to place and derive elements both in/from the beginnings, and in/from the end. deque? VP _> it is possible to get rid of explicit input of a variable (for example, queues). The code will be more compact and purer. In the same way normal procedures/functions relieve us of necessity of the explicit declaration of a stack. VP _> if the algorithm is written down with application PPPV/FPPV than it is in certain cases a bit easier . For example, if operation with a tree or a network is programmed. Not so clearly why not to use the normal compiler and already  algorithms for operation with a network and a tree. VP _>>> not to complicate understanding, at once I will result pair of examples. _>> than you erlang does not arrange? VP _> there is also Hell, Silk, Haskell and many other things. All of them are in own way quite good. Here additional/alternative means of multisequencing are offered. There are still sql in which you write that want to receive and the server itself builds the execution plan and  your request. VP _> the modern compiler not in a state completely and effectively  any code. It is the intellectual task. Compilers well cope with simple cases, and special systems of multisequencing by means of the partial trace can help still in some cases, but whether is always optimal? Therefore, it is necessary to work sometimes manually. One-two directives of multisequencing - not the most bad variant It is effective  any code can nobody. Some codes not  absolutely. The compiler should be able to transform algorithms. You should it describe algorithm  other algorithm. Type Program Query Language. If to apply such conversions manually that complexity  the code it will be multiplied and at some instant hands it will write not probably. And here if to force to do such conversion the compiler that probably to reduce complexity. If you have a criterion of efficiency that it is possible to force to search for the machine such conversions of your algorithm which help to utilize effectively computing resources  gpu and clusters. For example for search of a matrix you can  them at the left-to the right and from top to down, and can break into units 32 on 32 for example and fulfill different flows. And the matrix can be stored in the lines, on columns, and it is possible units what to find room in caches of calculators and units can be stored as group of units. Too the tree can have various idea thus for one type  one data view is required, for another preferably other representation and change of representations should will be fulfilled by the compiler. Moreover for different types  there is still overhead charge on job relocation for the calculator and outswapping of result and depending on the size of the task it is necessary to estimate an overhead charge for "warming up" GPU or transmission on a network and to select an optimal variant of calculation in runtime. Also control over a failure of the equipment and  result. Very much it would be desirable that it happened under a cowl and the machine was engaged in it. It is necessary to be still ready to inexact results. Because the result of calculations can depend on a multisequencing method.

5

Re: Sentences on classical and parallel

VP _>> It is the linear structure, it is possible to place and derive elements both in/from the beginnings, and in/from the end. _> deque? It VP _>> It is possible to get rid of explicit input of a variable (for example, queues). The code will be more compact and purer. In the same way normal procedures/functions relieve us of necessity of the explicit declaration of a stack. VP _>> if the algorithm is written down with application PPPV/FPPV than it is in certain cases a bit easier . For example, if operation with a tree or a network is programmed. _> it is not so clear why not to use the normal compiler and already  algorithms for operation with a network and a tree. It is possible, if you work with a dial-up of standard algorithms. And if write nontrivial algorithm? Then, maybe, the offered approach is useful to you. And still, basically, can consider PPPV/FPPV as "syntactic sugar" - can be programmed and without them (as, for example, and without coroutines or a lambda-functions), but with them can be more convenient. VP _>> There is also Hell, Silk, Haskell and many other things. All of them are in own way quite good. Here additional/alternative means of multisequencing are offered. _> there are still sql in which you write that want to receive and the server itself builds the execution plan and  your request. Yes, of course, but in case of SQL we deal with tables and relational algebra - there many things initially it is better . Multisequencing possibilities generally, first of all, are defined by properties of the task, and already then possibilities of language and a computer. VP _>> the modern compiler not in a state completely and effectively  any code. It is the intellectual task. Compilers well cope with simple cases, and special systems of multisequencing by means of the partial trace can help still in some cases, but whether is always optimal? Therefore, it is necessary to work sometimes manually. One-two directives of multisequencing - not the most bad variant _> It is effective  any code can nobody. With it I will disagree. For example, reboric tasks perfectly , with close to 100 % efficiency (and if separate slices of the data after multisequencing are located in caches of processors and from exceeding 100 % efficiency - the superlinear acceleration is better). _> Some codes not  absolutely. Undoubtedly. _> the compiler should be able to transform algorithms. You should it describe algorithm  other algorithm. Type Program Query Language. If to apply such conversions manually that complexity  the code it will be multiplied and at some instant hands it will write not probably. And here if to force to do such conversion the compiler that probably to reduce complexity. If you have a criterion of efficiency that it is possible to force to search for the machine such conversions of your algorithm which help to utilize effectively computing resources  gpu and clusters. Yes, well. But this task, as far as I know, is not solved yet. I once was engaged in similar things, there under the block description of the plan of the decision of the task the suitable algorithm in the form of the program text, though serial, though, theoretically,  was generated from slices. Here at such level, approximately, all and now remains. The system of result of the code becomes very difficult and, unfortunately, technologies of configuration of the code ready to operation are sometimes very nontrivial and demand additional programming. _> for example for search of a matrix you can  them at the left-to the right and from top to down, and can break into units 32 on 32 for example and fulfill different flows. Yes, and for automatic multisequencing of such task the system should know about its structure, properties, and to know about possible approaches to multisequencing. Therefore, once again I will repeat, in a general view to find the parallel decision of the arbitrary task - to the normal compiler not under force (it already for artificial or natural intelligence). I any more am not engaged in such tasks and I recognize that at least as early as years 15-20 optimal multisequencing programmers whom various and adequate sintaksiko-semantic programming aids are required will be engaged. _> and a matrix it is possible to store in the lines, on columns, and it is possible units what to find room in caches of calculators and units can be stored as group of units. Too the tree can have various idea thus for one type  one data view is required, for another preferably other representation and change of representations should will be fulfilled by the compiler. Moreover for different types  there is still overhead charge on job relocation for the calculator and outswapping of result and depending on the size of the task it is necessary to estimate an overhead charge for "warming up" GPU or transmission on a network and to select an optimal variant of calculation in runtime. Also control over a failure of the equipment and  result. Very much it would be desirable that it happened under a cowl and the machine was engaged in it. _> it is necessary to be still ready to inexact results. Because the result of calculations can depend on a multisequencing method. Completely with you it agree.