4.9 Merging Stress-Flow Atom Paths
The above chapter has shown how stress-flow is capable of roughly mimicking a data-flow system. It will now be discussed in detail how stress-flow naturally solves all problems of data-flow systems naturally through “structural-persistency” briefly mentioned in paragraphs above.
Suppose we need to perform some complex, time consuming operations on a matrix. Given input matrix A, we need to calculate:
FIG. 11: Bad code replicating data-flow and its key problems
Both Op1 and Op2 are time consuming operations producing matrix as a result and their results are multiplied to produce the final result. The operation of this system involves feeding it with large number of consecutive matrix inputs A1, A2, A3,… Listing of bad stress-flow code performing this task but replicating problems inherent to data-flow systems is shown on FIG. 11. To mimic data-flow better, matrix data is passed by value. Input data is fed by calling routine “enter” which fires atoms “do_Op1” and “do_Op2.” These two atoms, in turn, report their result by calling “send_O1” and “send_O2” respectively. As we now need both results of Op1 and Op2 to proceed, operation of the “send_O1” and “send_O2” routines depends on whether the other value has already arrived. If the other value has not arrived, the result is stored for when it does. If the other value already did arrive, the multiplication is performed and reported by calling “result.” As the many multiple instances of all stress-flow atoms here can run simultaneously, this replicates operation of “dynamic” data-flow machine. If multiple instances of enter(A1), enter(A2), enter(A3), etc are fired before the previous result is reported, we will have serious problems. We might be, for example, getting Op2(A2) and Op2(A3) fed into the multiplication. Even if we correct this problem to make sure only proper pair of Op1 and Op2 results are multiplied, this still does not prevent the problem of, for example, Op1(A3) from arriving before Op1(A2) and thus being multiplied by Op2(A2). This is exactly the problem that required various methods of “tagging” tokens in dynamic data-flow machines and was subject of extensive research.
FIG. 11A: Proper code replicating data-flow and solving its problems
The proper way to write this code is shown on FIG. 11A. Entire code was enclosed inside object “calculate” which should have been a good object-oriented idea all along, since all this code constitutes a logical set of interconnected operations. Firing of the whole calculation was moved to the object constructor, to make it even better object-oriented code.
FIG. 11B: Methods of invoking code from FIG. 11A
Methods to start calculations for different A1, A2, A3, are shown on FIG. 11B. Depending on need, the objects are allocated globally, locally on stack, or heap. The last option assumes that garbage collector is operational and that reporting function “result” is capable of reaching the outside world itself without need to store pointer to instances of “calculate” object. This way, they cannot ever interfere with the other simultaneously running instances of “calculate. Everything acts now exactly as it should – as different objects “calculate” come to life, they have totally separate “playgrounds” complete with all necessary internally generated synchronization mechanisms. They do their work, they report their results, and then they automatically disappear. This means that stress-flow is able to replicate advanced data-flow architectures without a need of the tag. Could the solution be seen as the object’s “this” pointer internally generated and maintained by the compiler that naturally acting as the tag here? Not really, being able to create the new objects on the heap results in making a copy of the whole submachine, which was not possible when the actual hardware nodes were doing the work of our stress-flow atoms.
FIG. 11C: Collector construct used to elegantly implement FIG. 11 code
The code shown on FIG. 11A solves the problem, but it is not that elegant. It also needs to create a copy of one of the matrices to have it when the other one arrives. To deal with this issue, a special “collected” construct was added to stress-flow. It is simply a stress atom with several headings, but one body as shown on FIG. 11C (functions send_O1 and send_O2). A construct like that would be impossible to comprehend in any prior-art language. In stress-flow, it is quite easy to both comprehend and implement. All what it says – both send_O1 and send_O2 have to be triggered before the body gets executed. The moment the fist one is called, the result is only to store the parameters and suspend until the other call is made, which makes the body of the atom run. This in turn will release all callers to individual collected callers upon reaching the relaxed section. If the body returns a value, it is returned to all collected callers.
FIG. 11D: Collector construct from FIG. 11C using its own keyword
The construct can be an option inside a detach block, or it can have its own keyword as shown on FIG. 11D. The separate keyword is easier to spot but it needs to reserve another word. It also eliminates some syntax flexibility of being able to mix regular stress atoms with “collectors” that could be useful at certain circumstances. Internal implementation of the collector in this case only needs to count incoming calls and compare it to the total needed.
Described example deals with common problem found in various prior art visual programming systems. Some such systems actually make a claim that simply due to multiple paths being allowed in visually created data-flow based graph/program, full possible parallelism is achieved. This claim is completely false, which will be discussed in later portions of this description. In case of example being discussed, please note that no matter what Op1 and Op2 do, they are always executed fully in parallel here. Furthermore, countless multiple instances can run together which means countless Op1 and Op2 will be executing simultaneously. If application was slightly different, with the same instance of calculate being fired multiple times, the additional parallelism will come from the fact that multiplication of Op1 and Op2 results and further reporting will be overlapped, which means that while multiplication of the results is taking place, Op1 and Op2 are already calculating new data.
Send mail to
email@example.com with questions or