[Company Logo Image]    

Home Feedback Contents Search

4.14 Complex Sys.
4.1 Structure 4.2 Compiling 4.3 Example 4.4 Socket 4.5 OOP 4.6 Competing 4.7 Optimization 4.8 Connector 4.9 Merging 4.10 Emulation 4.11 Ordered Calls 4.12 Last-Only 4.13 Group Inputs 4.14 Complex Sys. 4.15 Prior Art... 4.16 Redundant SW 4.17 Visual SW 

Back Home Up Next

4.14 Complex, Dynamically Created and Modified Software Systems 

Examples implementing circuit from FIG. 12 constitute extremely efficient pieces of code that run a predefined, hard-coded, circuit. Changing the circuit would require recompiling the code. In order to build a dynamically reconfigurable system, all components used would have to be converted into separate, universal components, each having one or more inputs and each having a connector construct for their outputs. Connection of the output to the next input would therefore not be hard-coded, but be accomplished by linking/adding the input to the appropriate connector. Adding a new element to a circuit being constructed would involve allocating that element, and issuing instructions on its output connectors that would link these outputs to any number of required inputs. The complete set of elements described above allows constructing universal building blocks that on their own describe their best parallelism that can be allowed, without knowing anything about the elements connecting to them, without placing unnatural constraints on the idea being expressed, without needing any supervisory software to manage the system, and being able to naturally run on a large number of processors. All these elements contrast with previous art systems that sometimes had similar conceptual goals, but that were unable to achieve true object-oriented programming and true parallelism.

Most of the new elements defined on top of the stress-flow atom concept were introduced by showing how stress-flow elements can be used to implement a simple electrical circuit. This was only an example to introduce the elements of stress-flow. In fact, all stress-flow elements were intended as lowest-level necessary blocks to allow solving any commonly occurring parallelism problem. Emulation of an electrical circuit just happens to invoke right intuition that leads to better explanation of the concepts of stress-flow. Stress-flow was developed by defining a set of various, conceptually different parallel programming and synchronization problems and then trying to solve them with stress-flow at different stages of development. This is how each element presented here was developed. One of such applications considered was a brute force simulation system that had to generate all possible combinations of a set of input values, recalculate elements of extensive set of auxiliary data only when necessary, and then run final computationally-intensive simulation corresponding to a given combination of inputs. In particular, the system has been used to evaluate various market day trading strategies. An example of such a system will be described as it shows the use of introduced elements of stress-flow in application different from emulation of electrical system. 

 

FIG. 20: Diagram of a trading system simulation to be implemented as an example

A diagram of a simplified trading simulation system to be implemented is shown on FIG. 20. The purpose is to step through all possible combinations of Margin (N different values), Depth1 (M different values), and Depth2 (P different values) for a total of N*M*P steps. The simulation itself takes the Margin as parameter, moving average MA1 calculated at Depth1, moving average MA2 calculated at Depth2, and Delta calculated as subtraction of MA1 and MA2 tables. The object is to create universal components that can be used to implement the example and much more complex systems, based on more variables to base combinations on and with more intermediate data. Due to calculation intensity of the problem, the goal is to do implementation in an optimal way, so that the intermediate data is recalculated minimal number of times. For example – MA2 recalculates itself only when the Depth2 changed. This optimization must be automatic – meaning that each element of the library, including the simulation node, is written in a way that all recalculations happen automatically, only when needed, and nothing extra needs to be done besides connecting the intermediate data recalculation nodes into the system. Since the subject of stress-flow is parallelism, we want the solution to scale itself into (i.e. be able to take advantage of) the highest number of processors possible as well.

 

Margin Depth1 Depth2 MA1 MA2 Delta  Simulation

  1      1      1     Y   Y    Y        Y  

  2      1      1     N   N    N        Y

  1      2      1     Y   N    Y        Y  

  2      2      1     N   N    N        Y

  1      3      1     Y   N    Y        Y  

  2      3      1     N   N    N        Y

  1      1      2     Y   Y    Y        Y  

  2      1      2     N   N    N        Y

  1      2      2     Y   N    Y        Y  

  2      2      2     N   N    N        Y

  1      3      2     Y   N    Y        Y  

  2      3      2     N   N    N        Y

  1      1      3     Y   Y    Y        Y  

  2      1      3     N   N    N        Y

  1      2      3     Y   N    Y        Y  

  2      2      3     N   N    N        Y

  1      3      3     Y   N    Y        Y  

  2      3      3     N   N    N        Y

FIG. 21: Required sequence of operation of code implementing diagram from FIG. 20

FIG. 21 shows required execution schedule for N=2, M=3, and P=3. Margin, Depth1, and Depth2 columns show the state of their respective counters. The other columns show whether or not (Y=YES, recalculated) given node are to be recalculated for the given set of counters. The simulation node obviously needs to be recalculated after every change of counters. To meet the parallel scalability requirement, the simulation step for one set of counters must be able to start while simulation steps for many previous sets of counters are possibly still running. Same applies to MA1, MA2, and Delta calculations. In a case with unlimited number of processors where the Simulation step is more intensive that the sum of intensity of all intermediate data recalculations, a total of N*M*P of Simulation instances must be able to run at the same time.

Accomplishing these goals in prior art object-oriented languages even without requiring any parallelism is already quite difficult, if not impossible, since centralized update handler code/loop/queue would be needed. Due to the fact that one intermediate data calculations are cascaded, one relies on results of another, with paths splitting and then merging back, universal solution to the problem is not a trivial one. In our example, the simulation node cannot fire until MA1, MA2, and Delta are recalculated, if they need to be recalculated. Also, Delta cannot be recalculated simply as a result of MA1 being recalculated, because MA2 might need to be changed as well, only a little later. The common way to do something like that is to go through the graph invalidating and queuing all nodes that need to be recalculated, without actually recalculating them yet. After this is done, the queue is processed in a way that only recalculates a node that has all its inputs ready. In our example, the Simulation node would be queued first, but be processed last (after Delta) in situation where Depth2 has changed. This solution brings us very close to the prior-art data flow implementations methods with centralized execution model and its queue for nodes to process. Additionally, the algorithms to make sure the queued nodes are processed in the right order are complex and time consuming. This solution and its limitation is what stress-flow sought to totally avoid, replacing it with a model without any centralized supervisor or centralized queue of any sort.

For all these reasons, this particular problem is ideal grounds for comparing stress-flow to all previous prior art systems, languages, and methods. In spite of being relatively simple, the given problem offers all the challenges of ideal object-oriented and parallel design. Ease of implementation, universality, and parallel scalability performance of coding this problem in previous art languages and systems is a good benchmark to compare them against stress-flow. In fact, it seems to be impossible to solve this simple problem in any previous-art language or programming system without sacrificing one or more specified goals.

 

const  int K=...;            // Size of tick data

float  Tick[K];              // Tick data to process

//--------------------------------------------------

struct Counter

{  int          Cur,Max;

   connector    Changed(int i);

   connector    Roll   (); 

   void         Counter(int iMax) {  Cur=Max=iMax;           };

   detach  void Step   ()         {  if (++Cur>Max)

                                        {  Cur=1; !!Roll();};

                                     !!Changed(Cur);

                                  };

};

//--------------------------------------------------

struct MA

{  connector    Changed (float* pMA);

   detach  void Go      (int iD)  { return;

                                    float* p=new float[K];

                                    float Sum=Tick[0]*iD;

                                    for ( int i=1; i<K; i++)

                                    {  if ( i-iD<0 )

                                          Sum-=Tick[0];

                                       else

                                          Sum-=Tick[i-iD];

                                       Sum+=Tick[i];

                                       p[i]=Sum/iD;

                                    };

                                    !!Changed(p);

                                  };

};                                                   

//--------------------------------------------------

struct Delta

{  connector     Changed (float* pMA);

   collect  void Go1     (float* pMA1)

                 Go2     (float* pMA2){ return;

                                        float* p=new float[K];

                                        for ( int i=0; i<K; i++)

                                          p[i]=pMA2[i]-pMA1[i];

                                        !!Changed(p);

                                      };

};                                                

//--------------------------------------------------

struct Simulation

{  static int    ResCount;

   static float  Results[MaxRes];

                 Simulation  ( )       { ResCount=0; };

   collect  void Margin  (int    iMargin)

                 MA1     (float* pMA1)

                 Delta   (float* pDelta)

                 MA2     (float* pMA2){ int iRes=ResCount++;

                                        return;

                                        ... // Run simulation

                                        Results[iRes]=...

                                      };

};

FIG. 22: Object declarations of code implementing diagram from FIG. 20

FIG.22 shows object declarations for implementation of diagram from FIG. 20. All components are designed as elements that can be dynamically interconnected. This means that all objects are fairy universal library-type components that provide “connector” interface for their outputs, rather than calling specific stress-flow atoms by name. The source data to work on is stored in global table named Tick. In fully universal library, this source data would be part of some base object that all the other library components were based on, or passed as parameters. Doing that here would only clutter the code making it harder to explain without changing the parallel functionality in any way.

The “Counter” object has one “ordered” input to “Step” it and two “connector” outputs. The “Changed” output is triggered after every change of the counter. The “Roll” output is triggered whenever the counter rolls over. On initialization, the counter is set to its highest setting so that first “Step” rolls it over into the starting position and triggers recalculations required at start. Another way to deal with this problem would be to special case the initial update out of the constructor. The intention is to cascade the Counters so that “Roll” output of one is routed to “Step” input of another. Since “Changed” call inside “Step” is ordered, a counter that was stepped but did not roll will sent a “default” Step call to the next counter. This call, in turn, will be propagated to all ordered inputs connected to both outputs (“Changed” and “Roll”) of such next counter.

The “MA” object calculates the moving average of the “Tick” source data at depth given by parameter. The object has one input point “Go.” The simple moving average calculated here is the average value of last depth (iD parameter) elements.  The result is a table of the same size as source data passed as parameter to the output connector “Changed.” This result table is globally allocated on heap and it is assumed that the garbage collector is operational, which means the table is automatically de-allocated the moment the last pointer to it is destroyed or overwritten. As the pointer is passed on to the output connector, the stress-flow atoms that need that table will maintain it in the memory only as long as they are using it. This also allows moving all the code of “Go” stress-flow atom to the relaxed section. This means that each instance of “MA” can run simultaneously with many potentially still running previous instances. The standard (compiler-generated) default path of MA object will call default paths of all stress-flow atoms connected to the “Changed” connector.

The often occurring reliance on automatic, garbage-collected pointers and memory allocations (sometimes called managed or smart), leads to a question if some special method of “automatic references” seen in some object-languages (like C#) should not be used in stress-flow language. This might be a good idea, but would make the explanation more difficult. On the other hand, declaring all pointers as managed and garbage-collector operated (an option in many existing C++ compilers) is even better idea in a real multi-processor environment, where each processor would manage its own heap.

The “Delta” object has one “collector” input, with two entry points, each passing a table of data to process. As is the case with any collector, the body of the routine does not execute until both input paths/headers are called. The calls can either be regular, or the default-path calls. The collector’s compiler-generated default path is only followed when both header calls come from default-path calls. In this case, the compiler generated default path simply runs the “Changed” collector with the same parameters as before, in this case the pointer to the previously calculated result. If any of the headers calls came from regular (or non-default path) calls, the body is executed and a new result table created. Both parameters are valid, in case one of them is from a default-path call, the parameters are a copy of parameters from the previous regular call, which is exactly what is needed to calculate the result correctly. In this case, the result is simply a table that stores subtraction result of every pair of elements from the parameter tables. This is a very simple operation, one that could often be done by subtraction when needed. The routine is presented here as example of universal template to implement any two table parameter stress-flow atom, performing any complex operations on the two input tables and producing an output table as a result.

The “Simulation” object has one four-path “ordered collector” entry point. As any other collector, it does not execute until all four entry paths are collected, either by obtaining a regular call or a default-path call. Multiple instances of tables given as parameters persist in memory as long as there are “Simulation” instances using them and as long as they are needed as possible default-call copies. In its “stressed” section, the “Simulation” object stores incrementing index to the place in the output table where it will store its result. Doing it this way declares that we want the results stored in order the “Simulation” instances started executing and not in the order they finish. 

Counter         rMargin(N);

Counter         rDepth1(M);

Counter         rDepth2(P);

MA              rMA1;

MA              rMA2;

Delta           rDelta;

Simulation      rSim;

 

rMargin.Roll    => rDepth1.Step;

rDepth1.Roll    => rDepth2.Step;

 

rDepth1.Changed => rMA1.Go;

rDepth2.Changed => rMA2.Go;

 

rMA1.Changed    => rDelta.Go1;

rMA2.Changed    => rDelta.Go2;

 

rMargin.Changed => rSim.Margin;

rMA1.Changed    => rSim.MA1;

rMA2.Changed    => rSim.MA2;

rDelta.Changed  => rSim.Delta;

FIG. 22A: Data definitions of code implementing diagram from FIG. 20

 

for ( int k=0; k<N*M*P; k++ )

    rMargin.Step();

FIG. 22B: Loop initiating code implementing diagram from FIG. 20

Variable/node definitions and instructions interconnecting them are shown on FIG. 22A. Each node is represented by a variable, and each wire by a connect instruction. This completes the implementation of the problem at hand. All that has to be done is call rMargin.Step() from a loop as shown on FIG. 22B. To actually know when all started simulation instances finish, the methods shown on FIG. 2 through FIG. 5 can be used. The “all_done” function implemented there can be used for current simulation problem.

 

FIG. 23: Variation of problem shown on diagram on FIG. 20

 

struct Counter2

{  int          Cur,Max;

   connector    Changed(int i);

   connector    Roll   (); 

   connector    Next   (); 

   void         Counter(int iMax)

                      {  Cur=Max=iMax;           };

   detach 

   {  void Step   ()  {  if (++Cur>Max)

                            {  Cur=1; !!Roll();};

                         !!Changed(Cur);

                      };

      void Reset  ()  {  Cur=1;

                         Changed(Cur); Next();      

                      };                                 

   };

};

//--------------------------------------------------

struct Flip

{  connector    Link();

   detach nodef Go()  {  if ( !Go )

                            Link();

                      };

};

FIG. 24: Modified counter definition for diagram FIG. 23

 

Counter2        rMargin(N);

Counter2        rDepth1(M);

Counter2        rDepth2(P);

Flip            rFlip;

MA              rMA1;

MA              rMA2;

Delta           rDelta;

Simulation      rSim;

 

rMargin.Next    => rDepth1.Reset;

rDepth1.Next    => rDepth2.Reset;

rDepth2.Next    => rMargin.Step;

 

rMargin.Roll    => rDepth1.Step;

rDepth1.Roll    => rDepth2.Step;

rDepth2.Roll    => rFlip.Go;

 

rDepth1.Changed => rMA1.Go;

rDepth2.Changed => rMA2.Go;

 

rFlip.Link      => rMargin.Step;

 

rMA1.Changed    => rDelta.Go1;

rMA2.Changed    => rDelta.Go2;

 

rMargin.Changed => rSim.Margin;

rMA1.Changed    => rSim.MA1;

rMA2.Changed    => rSim.MA2;

rDelta.Changed  => rSim.Delta;

FIG. 24A: Modified data definition for diagram FIG. 23

 Code shown on FIGs. 22,22A,22B is simple and concise. But the method of starting the code is not. The spread of each simulation variable (N,M,P) had to be entered twice – once in constructors for the counters, the other to calculate the number of times to step the whole code in a loop. In order to be able to trigger the whole simulation experiment with one statement/call, the diagram was modified as shown on FIG. 23 and the corresponding code is shown on FIG. 24 and FIG 24A. The new Counter2 object has “Reset” input that allows forcing recalculation for the initial state and “Next” connector to cascade the “Reset” to the next counter. Instead of using “Next” connector, the “Reset” input of each counter could have instead been connected to a single control function/line. The chosen solution is however better because a given counter does not have to be aware of the whole system, but simply its immediate neighbors. “Reset” function of Counter2 therefore always forces the recalculation of data dependant on it by calling the “Changed” connector, and then it calls the “Next” connector. The “Reset” function/line of the first counter is used to start the whole system. The “Next” connector of the last counter is fed back to “Step” input of the first counter. This will trigger first regular counter sequence. To trigger further consecutive updates, some form of “Changed” connector of the first counter (but without parameters) could have been tied back to the “Step” input of the same counter. This would work, but the processing would never stop – after going through all the combinations, the counters would roll over and start again. To construct the code in a way that will cause processing to stop when all combinations have been processed, the retriggering of the first counter must not happen when the last counter rolls over. Since “Roll” connectors are used only by “ordered” operations, the “Roll” connector of the last counter gets a default call with every step and the only regular call when it rolls over after the last operation. It is thus easy to construct object “Flip” which can be used to retrigger the first counter every time except when the last counter rolls over.

Please note that apparent complexity of this particular example is mainly due to building it in a way that relies on universal library components and demanding optimal performance. Without these requirements the implementation is far simpler. An optimizing compiler can also remove most of the connectors and replace them with hard-coded calls when the connections do not have to be changeable at runtime. Regardless if the compiler can optimize or not, the example remains an excellent comparison vehicle to evaluate stress-flow versus the prior art tools and techniques – where meeting stated goals is impossible or resulting in far more complex code.

Back Home Up Next
Send mail to info@stressflow.com with questions or comments.
Copyright © 2005-2010. All Rights Reserved. Patents Pending in Multiple Jurisdictions.
First published 03/29/06. Last modified: 06/25/10