[Company Logo Image]    

Home Feedback Contents Search

5.2 Internals
5.1 Stack Use 5.2 Internals 5.3 Microprocessors 5.4 Parallel HW 5.5 VPP HW 

Back Home Up Next

5.2 Inner Workings of Implementation

The set of mechanisms described so far may seem fairly complex. The most important characteristic however is that all of the described mechanisms are easy to implement on any existing microprocessors in just a few instructions and specially created micro-processors could do key work as single hardware instruction. This is important because otherwise the overhead of switching the stress-flow atoms (which are essentially micro processes executing separately) could make stress-flow impractical. The set of operations used for interfacing between stress-flow atoms actually removes all the hardest elements of multi-processor design from the user’s code into the hardware or core interface layer. This has resulted in previously impossible level of simplification of parallel code design.

A big advantage of stress-flow is offering the same programming method regardless if the code is to run on one processor or on a hundred of them. Some key elements of implementation must still naturally vary according to target platform. In particular implementation running on a one, two,… up to several processors system will be slightly different from implementation on a massively multi-processor hardware. The first implementation actually has to emulate an interface of having minimum say 50 processors available as hosts for stress-flow atoms. Implementation on massively multi-processor hardware does not have to do that but it is desired that a single processor presents itself as several stress-flow hosts. This is helpful because stress flow atoms can be stalled waiting for some other stress flow atoms. If one hardware processor (even in a very-large number of processors system) could only be hosting one stress atom at a time, it would mean very inefficient use of resources as a processor hosting a suspended stress-flow atom could not execute a more recently fired (but ready to run) stress-flow atom. What this means is that regardless of target hardware, the most important interface layer is one or more pool of potential handlers for stress flow atoms. An implementation hosted on a system with shared memory and up to a several processors will have one such pool especially if it has to run on an existing operating system that already manages assignments of any threads to physical processors. But, for performance sake, a system with large number of processors will have separate pool of handlers for each processor or for a group of them bound by common location. The purpose is to be able to try to assign newly fired atoms to processors closest to the ones running stress flow atoms that the newly fired atom has to communicate with. This layer of implementation will be described together with hardware architecture best suited for stress-flow.

The biggest obstacle in implementing and testing stress-flow was the fact that pretty much all aspects of processor interfacing were done in fairly unorthodox matter. In fact, no synchronization tools offered by an existing operating system were used. Instead, all interface mechanisms were written directly using machine instruction-level operations, some of them being interlocked memory operations. Some standard terminology had to be used in description here (like critical-sections) but all the critical sections used were custom-made and fairly non-standard as well. The stress-flow interface consists of three elements: mini-thread (or mini-task), stress-flow atom base, and a lock mechanism. In greatest simplification, a mini-thread is a resource associated with its own stack and it implemented as two pointers – the “next thread pointer” and address of a stress-flow atom to run or being run. The “next thread pointer” is used to link mini-threads into lists/FIFOs/queues of idle, active, waiting for a free processor, or waiting for a lock mini-threads. A stress flow-atom base consists of a pointer to associated lock and associated member or global routine. A lock is simply the head of a FIFO of threads waiting for it plus pointer to the current owner thread which can be NULL.

As previously explained, initiating/calling a stress-flow atom is a multi-step process. The first thing that happens is attempting to reserve stress atom’s associated lock. If a lock is free the calling thread marks it as reserved and continues. If the lock isn’t free, the calling mini-thread adds itself to the lock’s FIFO and then suspends itself. Adding to the lock FIFO or any other FIFO simply means manipulating the “next thread pointers”. One way or another the next instructions of the calling mini-thread get executed when the lock has been reserved (either without waiting or as a result of the lock release call). Once this happens, the calling thread can obtain access to previously idle mini-thread, store parameters in it, and schedule it. Scheduling simply means storing address of the stress-flow atom to run in the newly obtained mini-thread and inserting it to the active mini-thread FIFO. Somewhere inside its body, the newly scheduled stress-flow atom will release its associated lock, which means taking the head thread from the lock’s FIFO off and activating it. Very low overhead of this system comes from the fact that each of these operation needs just a few pointer setting/resetting steps while micro-processor hardware built with stress-flow in mind could do it all in hardware.

Back Home Up Next
Send mail to info@stressflow.com with questions or comments.
Copyright © 2005-2010. All Rights Reserved. Patents Pending in Multiple Jurisdictions.
First published 03/29/06. Last modified: 06/25/10