The next great revolution in computer architecture is certain to bemultiprocessing, just as it has always been – always right around thecorner.
It seems that multiprocessing is the pigeon and the computerscientist is the child chasing it, only to have it take flight rightbefore capture. But maybe it really is within grasp this time, becauseof a number of enabling technologies.
One technology is networking, which allows programs and threads tobe distributed over large networks. Large chunks of data can betransferred quickly, with respect to the total processing time, overthe network between processors.
Another technology that is enabling multiprocessing is theincredible shrinking transistor. This means that the functionality thatpreviously required a printed circuit board or several boards in anentire system can now be placed on a single chip. Thus is born thesystem on a chip or “SOC.”
Throughout the history of computer design, there has always been atradeoff between processors and fixed hardware. Even the firstcomputers were really fixed program machines that we would recognizetoday as finite state machines. Processors, which use modifiable storedprograms to control functionality, are much slower and more expensive,in terms of hardware costs, size, and power consumption, compared tostate machines. The advantage of processors is their flexibility andease of use.
The line that is drawn by engineers regarding when and where to useprocessors versus fixed hardware has been continually moving in thedirection of more processors. Mainframes had one programmable centralprocessing unit with much surrounding fixed hardware to controlperipherals. The microprocessor has changed hardware design such thatsmall, inexpensive processors are now used to control just about everycomputer peripheral and most complex electronic devices.
Nowadays transistors are very cheap and very fast. The speed andcost disadvantage of a processor over a finite state machine is usuallyoverwhelmed by the flexibility advantage. And SOCs perform verycomplex, high-level functions. As John Hennessey, founder of MIPSTechnologies and president of Stanford University, has pointed out,writing a program on a chip to run a word processor is a lot easierthan creating a state machine to do it.
For this reason, many chip vendors are encouraging the use of manysmall processors to replace many small state machines. One processormay control a serial port while another controls a USB interface andyet another performs error detection and correction on an incomingEthernet packet.
Software synthesis keeps thingssimple
Software synthesis is a process that hides implementation details fromthe programmer. It is the next step in the evolution chain of codedevelopment tools from the assembler to the compiler and now to thesynthesizer.
When a programmer writes C code, for example, he does not need tothink about how to implement variables or structures – byte aligned,word aligned, near memory or far memory. He does not need to beconcerned with the question of where to place variables – in registersor memory. He does not need to consider how data is passed into and outof routines and objects – on the stack or on the heap. The compilerautomates all this. Software synthesis allows the programmer to writecode at an even higher level so that he doesn’t need to know howmessages are passed between concurrent tasks in a multitasking systemor how mutexes and semaphores are implemented.
To synthesize software, “primitives” are defined that look likesimple function calls but are actually higher-level abstractions thatdescribe complex operations. One example is shown in Figure 1 below .
|Figure1. Synthesis input code and output code|
On the left side of the table is a primitive that spawns the taskLedtask. On the right side of the table is the C code that is generatedfrom the primitive. The synthesized code loads a task handle and thetask arguments into a task control block (TCB) queue, and allocatesspace for the return value, if any. The operating system schedules alltask execution based on the task queues and configuration parametersthat are supplied by the user at synthesis time, using a synthesisconfiguration file much like a make file.
Macros can be thought of as degenerate primitives because they canalso be used to generate lower-level code from higher-level statements.However, macros are only the very simplest form of synthesis primitivesbecause they synthesize code locally and statically.
By this I mean that the code that is created from a macro isgenerated only where the macro has been placed (locally) and thegenerated code is always the same (static). For complex primitives likethose involving multitasking systems, the code generation is bothglobal and dynamic.
For example, a primitive for accessing a hardware resource, like asingle serial port, will generate a mutex and code for checking,setting, and resetting the mutex if there are other primitives in thesystem for using the same resource. If there are no other primitivesthat access the resource, there is no need for a mutex and the mutexcode will not be generated.
So the code generation is dynamic because the code generated at thelocation of the primitive depends not only on the primitive itself butalso on code elsewhere in the system.
Similarly a primitive in one location can generate code in anotherlocation. The primitive that generates the TCB queue entries must alsogenerate the corresponding code in the operating system that handlesthe queues. Thus software synthesis involves global code generation.
An added advantage of software synthesis is that the portability ofthe code can be greatly increased. For example, hardware dependenciescan be incorporated into primitives so that the details of accessingthe hardware are in the synthesized code. In this way primitivesimplement a portable hardware abstraction layer. Also, primitives canbe used to implement a set of universal operating system APIs. Settinga configuration switch can cause the code to be synthesized to run onWindows or Linux, or even to synthesize the operating system itself,without any changes to the pre-synthesis code.
Applying software synthesis toRTOSes
As mentioned previously, software synthesis can automate the process ofcreating the operating system itself. Operating systems started out assimple mechanisms for supporting and scheduling the tasks that arerunning on top of it. Operating systems have grown over the years tobecome huge behemoths that have gobbled up many of the common utilitiesand applications that used to run as separate tasks on top of it.
Modern operating systems are designed to schedule and support anyconceivable combination of applications. While this strategy may,debatably, be ideal for desktop systems, workstations, and mainframes,it is often a poor strategy for embedded systems where memory size andresponse time can be critical.
For SOCs, the strategy is particularly bad, and for multiprocessorSOCs, this strategy simply does not work. For multiprocessor SOCs, anentirely new strategy is needed and software synthesis is that newstrategy.
Some industry experts have said that an SOC architecture thatcomprises many small processors does not need an operating systembecause each processor is performing only one task. This is just nottrue. If the system is to function, each processor must perform atleast two tasks – the task for which it is designed and a task thatcommunicates with the rest of the system. Any such processor thatperforms more than one task need a real-time operating system (RTOS),perhaps as simple as a polling loop scheduler, to schedule the tasks,allocate resources, and prevent deadlocks.
Applying software synthesis tomultiprocessing
One of the most important jobs of an operating system is to synchronizetasks, particularly to allow different tasks access to shared resourceswithout interfering with each other.
For example, if two tasks both need to output text to a printer, itis important that one task completely outputs all of its text beforethe other task begins outputting its text. Otherwise, the text outputswill be interspersed, as the operating allocates time slices betweenthe tasks, resulting in incomprehensible printouts.
The mechanism for coordinating a shared resource is called a mutexif the coordination can be done with a simple true-false value, or moregenerally a semaphore if the coordination requires more information.Essentially a task that wants to access a shared resource first checksthe semaphore to determine if another task is already accessing it. Ifthe resource is free, the task sets the semaphore, performs anuninterrupted access, then resets the semaphore so that other tasks canuse the resource.
|Figure2. Using a semaphore to access a shared resource.|
An example of this series of actions is shown by the code in Figure 2, above, where the main routine creates two threads, each executing thevalPrint() routine, and a semaphore for the shared printf() routine.Each valPrint() thread must check, set, and reset the semaphore atappropriate points when accessing the shared printf() routine.Accessing the semaphore should be accomplished through the operatingsystem, because the operating system is aware of task priorities anduses that information to decide which of multiple tasks requesting theshared resource should be given access.
A more complicated scenario, that occurs in multiprocessing systems,is how to give access to tasks running on different processors. Thoughthere are some multiprocessor operating systems, they are large andcomplex and not suited for SOCs that have limited resources. It is morecommon, and simpler, to have each processor running its own RTOS. Inthat case, a common way to implement shared resources is to keepspecial multiprocessor semaphores in global memory and have amechanism, often a hardware mechanism, to arbitrate the access to theshared resource.
|Figure3. Multitasking with shared resources using hardware arbitration|
Such a system in shown in Figure 3, above, wherethe box labeled “MP semaphore” consists of global memory holding themultiprocessor semaphore for the peripheral hardware. Similarly,messages are passed through areas of global memory, labeled “MPmailbox.” Arbitration for accessing the peripheral or sending messagesis controlled by the arbitration logic, implemented in hardware.
As I have discussed before, hardware is inflexible and a softwaresolution, that can be modified as the system software changes, ispreferable. An alternate solution, and one that is easily supported bysoftware synthesis, is shown in Figure 4, below.
In this case, the global memory is still used to hold semaphores andmailboxes, but the arbitration logic has been placed in software as atask on one of the processors. The decision about which processorshould execute the arbitration task depends on many differentconsiderations and can be a simple or complex decision. Typically, thearbitration task will go on the processor that has the least loading.
|Figure4. Multitasking with shared resources using software arbitration|
The good news is that if performance requirements change, thearbitration task can easily be moved to any other processor in thesystem. Arbitration parameters and priorities can also easily bechanged. By using software synthesis, these changes are oftenaccomplished with only changes to a few statements in a configurationfile and then re-synthesizing the code.
Robot Arm: A heterogeneousmultiprocessing example
At Zeidman Technologies we created a robot arm application todemonstrate how software synthesis can be used to easily generate codefor a heterogeneous multiprocessing environment.
The system consists of a mouse and a multi-joint robot arm. Themouse driver software resides on one processor. The robot arm driversoftware resides on another processor. When the mouse is moved on asurface, corresponding joints of the robot arm move. When the mouseleft button is held down, the robot grip is closed. When the mouseright button is closed, the robot grip is opened.
In our demo, the mouse is controlled by a 32-bit MicroBlaze softprocessor in a Xilinx Virtex-II FPGA. The robot arm is controlled by a32-bit PowerPC processor in the same Xilinx Virtex-II. The Virtex-IIresides on a Memec Development board with a P160 communications moduledaughterboard. There are two serial ports, one on the motherboard andthe other on the daughterboard. A serial mouse is connected to oneserial port and the robot arm is connected to the other serial port.
The robot arm has several joints – base, shoulder, elbow, wrist, andgrip – that can move independently. The robot arm has a serialinterface such that specific ASCII characters on the serial portcommand each individual joint to move in one of two directions (e.g.left/right or up/down) or to stop moving. Commands sent to one joint donot affect the movement of the other joints, so that they can all moveindependently.
The PowerPC processor was used to control the robot arm. Multipletasks were used to independently control each joint of the arm. Asingle task was used to send commands to the robot arm via a serialport interface. The MicroBlaze processor was used to receive controldata from the mouse via a serial port interface. A multiprocessorsemaphore in shared memory was used to pass messages from theMicroBlaze to the PowerPC. An arbitration task running on theMicroBlaze was used to arbitrate access to the semaphore thatcontrolled communication from the mouse to the robot arm.
In order to synthesize code for the operating system and thesemaphores, we used our SynthOStool. The RTOS generated for the PowerPC used only 2.3K bytesof memory. The RTOS generated for the MicroBlaze used only 900bytes of memory. The system worked very well; communication occurredbetween the system without glitches or lost data.
The robot arm used a very slow communication rate and the motors ranvery slowly. Without multitasking, mouse control of the arm was alsovery slow. But by using multitasking on top of a small RTOS, the robotarm reaction to mouse commands occurred very quickly. This was a bit ofa problem for us because we demonstrated the robot at DesignCon 2005where we challenged attendees to manipulate the robot arm to pick upStarbucks gift certificate cards. The arm worked so well that we endedup giving out many Starbucks cards.
Multitasking software can be difficult to write and even more difficultto debug. When multitasking software is executed on a multiprocessingsystem, particularly a heterogeneous multiprocessing system, theproblems are compounded. Software synthesis solves many of theresulting problems by automating software development and generatingcorrect-by-design code.
Software synthesis allows programmers to write code formultiprocessing systems at a higher level, with better reliability,just as compilers did for complex processors many years ago. Inessence, software synthesis is the next generation of softwaredevelopment tool and will gain more acceptance in the future,particularly as more processors are placed on single chips.
1. Zeidman, Bob, “Back to thebasics: Programmable Systems on a Chip,” Embedded.com, July 27, 2005.
2. Zeidman, Bob, “SoftwareSynthesis for OS-Independent Coding,” Dr. Dobb’s Journal, April 2005,pp 58-63.
3. Zeidman, Bob, “Softwaresynthesis for embedded systems,” Embedded Systems Programming, February2005, pp 36-43.
4 . Zeidman, Bob, “The Future ofProgrammable Logic,” Embedded Systems Programming, October 2, 2003.
5. Raineault, Ted, “SemaphoresAid Multiprocessor Designs,” Embedded Edge, October 2001.
6 . Peterson, J. andSilberschatz, A., Operating System Concepts, Addison-Wesley PublishingCompany, 1982.
7. Goering, Richard, “SoCprogramming models needed, researchers say,” EE Times, July 14, 2005.
8. URL: http://www.embedded.com/
9. Cravotta, Robert, “Drivingout complexity with abstraction tools,” EDN, June 24, 2004.
Bob Zeidman is the president of Zeidman Technologies , a companythat develops tools for embedded systems hardware and softwaredevelopment. Zeidman Technologies has a patent on software synthesis.Bob is the author of the textbooks Verilog Designer's Library,Introduction to Verilog, and Designing with FPGAs and CPLDs. He was therecipient of the 1994 Wyle/EE Times American By Design Award and otherengineering, writing, and scholastic awards. Bob holds two patents andhe has an MSEE from Stanford University and a BSEE and a BA in physicsfrom Cornell University.
For more information about this topic, go to Moreabout multiprocessing, multicores and tools.