6.7 Application in Operating Systems
Constant demand of increase in performance of computer systems and approaching physical limits of single-processor systems combined with lack of accepted, universal parallel programming methods had a very negative effect on architectures of computer systems. Simplicity and universality have been sacrificed in order to increase performance in executing of common tasks. To achieve some parallelism, big effort was made to move time-consuming operations into firmware of dedicated purpose processors and thus take load off the main processor. Such solution is very messy, with gains non-proportional to costs involved, which best demonstrates the urging need for universal parallelism in mainstream computing. The issue can be best described by discussing evolution of video hardware. To provide faster display rendering capabilities, especially with 3D graphics, ever increasing processing power was needed. Doing 3D rendering on main processor quickly became impossible and forced the need for extra processors doing it. This way, all present day video cards got equipped with dedicated high speed processors for rendering. This only solved the problem partially, because the video card obviously cannot produce the graphics on its own. It still has to get 3D mesh and textures, and their updates from the main processor. Without fast means of communications and storing some of this info inside the video card hardware the performance gains obtained by extra processor would be quickly erased. This forced costly and complex caching of data on both ends, dedicated energy consuming high-speed busses to send the data between main processor and the graphics processors, etc. This solution obviously works but universal purpose parallelism could accomplish these goals in far simpler, less expensive fashion.
FIG. 35: Mesh parallel architecture connected to video and other hardware
Consider mesh of processors from FIG. 30 having one of its edges directly connected to video frame memory as shown on FIG. 35. Mesh is used as an example here, all interconnected processor architectures (2, 3, and more-dimensional) can use described methods. From electronic standpoint, such architecture is quite simple. The problem was lack of good method of programming such architecture. Stress-flow fills that void. Mesh processors can now work on producing graphics in any number that is necessary. Graphic processing software now runs on the mesh like any other stress flow atoms. No firmware or dedicated high speed busses are needed. If graphics job is complex, the graphics related stress flow atoms get assigned deeper and deeper away from the video frame memory. Node assignment procedures result in the situation where stress-flow atoms doing final filling of raster memory and Z-buffering are closest to it, those that do higher level 3D mesh, special effect, or animation work, are further away from it. The mesh processors load configures itself to most efficiently perform current task. The solution is decentralized, which allows redundancy and better use of resources. Damage in part of the mesh can be bypassed. The whole architecture is very simple and both cost and energy efficient. There is no need to have dedicated resources for worst case graphics load scenario. All hardware is universal – if it is not needed for graphics at the moment, it can be doing something else.
Similar thing might be done with other hardware, a hard disk for example, shown connected to the left edge of the processors’ mesh on FIG. 35. Due to desire to off-load the main processor, substantial processing power has been placed inside hard-disk firmware. They were equipped with large memory for caching data read from and to be written to hard disks together with algorithms for optimizing accesses, pre-guessing operations, etc. A hard disk connected to mesh of processors utilizing stress-flow does not need to have any of that – due to processing power available for utilization at the mesh, the hard disk can be rudimentary, low-level function hardware again with all the sophistication provided by software running on the mesh, since greater processing power can be assigned if needed to hard-disk data routines without facing bottlenecks of a single hard-disk hardware interface.
Such distributed solution for hardware specific processes not only increases performance, but has a huge number of other advantages: it offers redundancy, eliminates need for ever changing custom busses and interfaces as bottlenecks that constantly need changing and upgrading, and makes entire universal-purpose computer really universal – as in based entirely software rather than a lot of fixed firmware that cannot be changed or even troubleshooted by the user. The redundancy aspect of computing is something that has seen a lot of effort lately. Sophisticated controllers have been produced which, for example, allow for redundant hard-disk capabilities which makes it possible to bypass one fallen hard-drive. Such redundancy is obviously very limited – if the main processor or the redundant hard-drives controller fails, the whole system is still completely grounded. Stress-flow does not have such problem – a mesh of processors utilizing stress-flow can bypass faulty mesh elements, while processing power of the mesh allows on utilizing any redundant hardware scheme running as mesh software and without need of dedicated purpose hardware or firmware. Such universality can greatly increase performance and reliability of computer system as performance can be improved by simply increasing the size of the mesh or other interconnected architecture. New functionality can be added by simply changing the mesh software to, for example, include encryption/decryption, or user ownership and permission schemes as a low level function of hard-disk algorithms. Similar advantages could be accomplished by implementing the network hardware, drivers, and interfaces in the same fashion.
A computer system designed from scratch and based on stress-flow would totally change and simplify design of an operating system if this term would still make sense in such new architecture. All the software, including what used to be operating system, could be distributed without need for any centralized supervisor. The whole process would look like this – programs as some logical constructs performing specific new job would be a collection of stress flow atoms calling one another and some outside stress-flow atoms. Outside stress-flow atoms are identified by some expanded name plus digital signature/identifier of its maker. Starting with entry point, the stress flow atoms attempt to initialize themselves as recommended by the ‘location’ function described in previous paragraphs. For hardware drivers or specific hardware related code, the location will always point to nodes neighboring the hardware. Previously run stress flow atoms code is cached at each node as memory allows, if the code is already there, it is reused, if not, various repositories of stress flow atoms are contacted for a copy. The list of repositories should be provided as context – static information stored when entry point stress flow atom was executed. First repositories would be local hard-drives, the last ones could be the internet servers of the desired stress flow atom’s creator. A specific node searching for a stress flow atom copy would not even need any complex software for contacting various repositories. All it needs to know – if you need a copy from repository xxx, ask for it a node in direction of a given ‘location’ structure representing the repository. The contacted node would either provide a copy or ask for it a node closer still to the repository, until the repository itself was reached. For performance sake, if its memory allowed, a node would cache not only the stress flow atoms that it actually run, but also those that were copied through it. This way, relatively unused nodes (through which the repository access path will most likely travel toward the repository) will serve as natural program cache saving the trouble of traversing the mesh and re-reading the code from the repository. Such repository search function running on the consecutive nodes would naturally also be a stress flow atom that would either pass the call onto the closer node or call repository (such as disk) reading routines if it was close enough in the mesh to the given repository hardware connection.
In this way, a truly decentralized, distributed-processing, redundancy supporting, self-expanding, self-updating universal computer system could be built. It is true that such features are today known already with some present operating systems. The critical difference, however, is that these features are currently only provided as expensive custom extensions to existing systems that by their originating nature were built mostly as single-threaded, single-processor code. Such additions are sort of pretty dresses that make underlying code function as if it was not written as single-threaded, non-distributed code which pretty much all programming tools still produce. The situation is totally different with stress-flow were multi-tasking, multi-processing, decentralization, distributed processing, distribution of resources, redundancy, etc were designed to by underlying fundamentals of all software design that could eventually allow revolutionary improvement in software design methodology and performance. As far as operating system is concerned – there would be no centralized resource managing software, but rather collection of independent drivers managing their hardware.
The extra advantage is that this would not have to be done all at once. The methodology of stress-flow could gradually be introduced with great benefit into the existing computer systems through providing emulation of stress-flow hardware coexisting with conventional code, which has already been described and implemented.
Send mail to
email@example.com with questions or