Using software synthesis for multiprocessor OS and software development

Bob Zeidman

January 06, 2006

Bob ZeidmanJanuary 06, 2006

The next great revolution in computer architecture is certain to be multiprocessing, just as it has always been – always right around the corner.

It seems that multiprocessing is the pigeon and the computer scientist is the child chasing it, only to have it take flight right before capture. But maybe it really is within grasp this time, because of a number of enabling technologies.

One technology is networking, which allows programs and threads to be distributed over large networks. Large chunks of data can be transferred quickly, with respect to the total processing time, over the network between processors.

Another technology that is enabling multiprocessing is the incredible shrinking transistor. This means that the functionality that previously required a printed circuit board or several boards in an entire system can now be placed on a single chip. Thus is born the system on a chip or “SOC.”

Throughout the history of computer design, there has always been a tradeoff between processors and fixed hardware. Even the first computers were really fixed program machines that we would recognize today as finite state machines. Processors, which use modifiable stored programs to control functionality, are much slower and more expensive, in terms of hardware costs, size, and power consumption, compared to state machines. The advantage of processors is their flexibility and ease of use.

The line that is drawn by engineers regarding when and where to use processors versus fixed hardware has been continually moving in the direction of more processors. Mainframes had one programmable central processing unit with much surrounding fixed hardware to control peripherals. The microprocessor has changed hardware design such that small, inexpensive processors are now used to control just about every computer peripheral and most complex electronic devices.

Nowadays transistors are very cheap and very fast. The speed and cost disadvantage of a processor over a finite state machine is usually overwhelmed by the flexibility advantage. And SOCs perform very complex, high-level functions. As John Hennessey, founder of MIPS Technologies and president of Stanford University, has pointed out, writing a program on a chip to run a word processor is a lot easier than creating a state machine to do it.

For this reason, many chip vendors are encouraging the use of many small processors to replace many small state machines. One processor may control a serial port while another controls a USB interface and yet another performs error detection and correction on an incoming Ethernet packet.

Software synthesis keeps things simple
Software synthesis is a process that hides implementation details from the programmer. It is the next step in the evolution chain of code development tools from the assembler to the compiler and now to the synthesizer.

When a programmer writes C code, for example, he does not need to think about how to implement variables or structures – byte aligned, word aligned, near memory or far memory. He does not need to be concerned with the question of where to place variables – in registers or memory. He does not need to consider how data is passed into and out of routines and objects – on the stack or on the heap. The compiler automates all this. Software synthesis allows the programmer to write code at an even higher level so that he doesn’t need to know how messages are passed between concurrent tasks in a multitasking system or how mutexes and semaphores are implemented.

To synthesize software, “primitives” are defined that look like simple function calls but are actually higher-level abstractions that describe complex operations. One example is shown in Figure 1 below.

Figure 1. Synthesis input code and output code

On the left side of the table is a primitive that spawns the taskLed task. On the right side of the table is the C code that is generated from the primitive. The synthesized code loads a task handle and the task arguments into a task control block (TCB) queue, and allocates space for the return value, if any. The operating system schedules all task execution based on the task queues and configuration parameters that are supplied by the user at synthesis time, using a synthesis configuration file much like a make file.

Macros can be thought of as degenerate primitives because they can also be used to generate lower-level code from higher-level statements. However, macros are only the very simplest form of synthesis primitives because they synthesize code locally and statically.

By this I mean that the code that is created from a macro is generated only where the macro has been placed (locally) and the generated code is always the same (static). For complex primitives like those involving multitasking systems, the code generation is both global and dynamic.

For example, a primitive for accessing a hardware resource, like a single serial port, will generate a mutex and code for checking, setting, and resetting the mutex if there are other primitives in the system for using the same resource. If there are no other primitives that access the resource, there is no need for a mutex and the mutex code will not be generated.

So the code generation is dynamic because the code generated at the location of the primitive depends not only on the primitive itself but also on code elsewhere in the system.

Similarly a primitive in one location can generate code in another location. The primitive that generates the TCB queue entries must also generate the corresponding code in the operating system that handles the queues. Thus software synthesis involves global code generation.

An added advantage of software synthesis is that the portability of the code can be greatly increased. For example, hardware dependencies can be incorporated into primitives so that the details of accessing the hardware are in the synthesized code. In this way primitives implement a portable hardware abstraction layer. Also, primitives can be used to implement a set of universal operating system APIs. Setting a configuration switch can cause the code to be synthesized to run on Windows or Linux, or even to synthesize the operating system itself, without any changes to the pre-synthesis code.

Applying software synthesis to RTOSes
As mentioned previously, software synthesis can automate the process of creating the operating system itself. Operating systems started out as simple mechanisms for supporting and scheduling the tasks that are running on top of it. Operating systems have grown over the years to become huge behemoths that have gobbled up many of the common utilities and applications that used to run as separate tasks on top of it.

Modern operating systems are designed to schedule and support any conceivable combination of applications. While this strategy may, debatably, be ideal for desktop systems, workstations, and mainframes, it is often a poor strategy for embedded systems where memory size and response time can be critical.

For SOCs, the strategy is particularly bad, and for multiprocessor SOCs, this strategy simply does not work. For multiprocessor SOCs, an entirely new strategy is needed and software synthesis is that new strategy.

Some industry experts have said that an SOC architecture that comprises many small processors does not need an operating system because each processor is performing only one task. This is just not true. If the system is to function, each processor must perform at least two tasks – the task for which it is designed and a task that communicates with the rest of the system. Any such processor that performs more than one task need a real-time operating system (RTOS), perhaps as simple as a polling loop scheduler, to schedule the tasks, allocate resources, and prevent deadlocks.

Applying software synthesis to multiprocessing
One of the most important jobs of an operating system is to synchronize tasks, particularly to allow different tasks access to shared resources without interfering with each other.

For example, if two tasks both need to output text to a printer, it is important that one task completely outputs all of its text before the other task begins outputting its text. Otherwise, the text outputs will be interspersed, as the operating allocates time slices between the tasks, resulting in incomprehensible printouts.

The mechanism for coordinating a shared resource is called a mutex if the coordination can be done with a simple true-false value, or more generally a semaphore if the coordination requires more information. Essentially a task that wants to access a shared resource first checks the semaphore to determine if another task is already accessing it. If the resource is free, the task sets the semaphore, performs an uninterrupted access, then resets the semaphore so that other tasks can use the resource.

Figure 2. Using a semaphore to access a shared resource.

An example of this series of actions is shown by the code in Figure 2, above, where the main routine creates two threads, each executing the valPrint() routine, and a semaphore for the shared printf() routine. Each valPrint() thread must check, set, and reset the semaphore at appropriate points when accessing the shared printf() routine. Accessing the semaphore should be accomplished through the operating system, because the operating system is aware of task priorities and uses that information to decide which of multiple tasks requesting the shared resource should be given access.

A more complicated scenario, that occurs in multiprocessing systems, is how to give access to tasks running on different processors. Though there are some multiprocessor operating systems, they are large and complex and not suited for SOCs that have limited resources. It is more common, and simpler, to have each processor running its own RTOS. In that case, a common way to implement shared resources is to keep special multiprocessor semaphores in global memory and have a mechanism, often a hardware mechanism, to arbitrate the access to the shared resource.

Figure 3. Multitasking with shared resources using hardware arbitration

Such a system in shown in Figure 3, above, where the box labeled “MP semaphore” consists of global memory holding the multiprocessor semaphore for the peripheral hardware. Similarly, messages are passed through areas of global memory, labeled “MP mailbox.” Arbitration for accessing the peripheral or sending messages is controlled by the arbitration logic, implemented in hardware.

As I have discussed before, hardware is inflexible and a software solution, that can be modified as the system software changes, is preferable. An alternate solution, and one that is easily supported by software synthesis, is shown in Figure 4, below.

In this case, the global memory is still used to hold semaphores and mailboxes, but the arbitration logic has been placed in software as a task on one of the processors. The decision about which processor should execute the arbitration task depends on many different considerations and can be a simple or complex decision. Typically, the arbitration task will go on the processor that has the least loading.

Figure 4. Multitasking with shared resources using software arbitration

The good news is that if performance requirements change, the arbitration task can easily be moved to any other processor in the system. Arbitration parameters and priorities can also easily be changed. By using software synthesis, these changes are often accomplished with only changes to a few statements in a configuration file and then re-synthesizing the code.

Robot Arm: A heterogeneous multiprocessing example
At Zeidman Technologies we created a robot arm application to demonstrate how software synthesis can be used to easily generate code for a heterogeneous multiprocessing environment.

The system consists of a mouse and a multi-joint robot arm. The mouse driver software resides on one processor. The robot arm driver software resides on another processor. When the mouse is moved on a surface, corresponding joints of the robot arm move. When the mouse left button is held down, the robot grip is closed. When the mouse right button is closed, the robot grip is opened.

In our demo, the mouse is controlled by a 32-bit MicroBlaze soft processor in a Xilinx Virtex-II FPGA. The robot arm is controlled by a 32-bit PowerPC processor in the same Xilinx Virtex-II. The Virtex-II resides on a Memec Development board with a P160 communications module daughterboard. There are two serial ports, one on the motherboard and the other on the daughterboard. A serial mouse is connected to one serial port and the robot arm is connected to the other serial port.

The robot arm has several joints – base, shoulder, elbow, wrist, and grip – that can move independently. The robot arm has a serial interface such that specific ASCII characters on the serial port command each individual joint to move in one of two directions (e.g. left/right or up/down) or to stop moving. Commands sent to one joint do not affect the movement of the other joints, so that they can all move independently.

The PowerPC processor was used to control the robot arm. Multiple tasks were used to independently control each joint of the arm. A single task was used to send commands to the robot arm via a serial port interface. The MicroBlaze processor was used to receive control data from the mouse via a serial port interface. A multiprocessor semaphore in shared memory was used to pass messages from the MicroBlaze to the PowerPC. An arbitration task running on the MicroBlaze was used to arbitrate access to the semaphore that controlled communication from the mouse to the robot arm.

In order to synthesize code for the operating system and the semaphores, we used our SynthOS tool. The RTOS generated  for the PowerPC used only 2.3K bytes of memory. The RTOS generated  for the MicroBlaze used only 900 bytes of memory. The system worked very well; communication occurred between the system without glitches or lost data.

The robot arm used a very slow communication rate and the motors ran very slowly. Without multitasking, mouse control of the arm was also very slow. But by using multitasking on top of a small RTOS, the robot arm reaction to mouse commands occurred very quickly. This was a bit of a problem for us because we demonstrated the robot at DesignCon 2005 where we challenged attendees to manipulate the robot arm to pick up Starbucks gift certificate cards. The arm worked so well that we ended up giving out many Starbucks cards.

Multitasking software can be difficult to write and even more difficult to debug. When multitasking software is executed on a multiprocessing system, particularly a heterogeneous multiprocessing system, the problems are compounded. Software synthesis solves many of the resulting problems by automating software development and generating correct-by-design code.

Software synthesis allows programmers to write code for multiprocessing systems at a higher level, with better reliability, just as compilers did for complex processors many years ago. In essence, software synthesis is the next generation of software development tool and will gain more acceptance in the future, particularly as more processors are placed on single chips.

1. Zeidman, Bob, “Back to the basics: Programmable Systems on a Chip,”, July 27, 2005.
2. Zeidman, Bob, "Software Synthesis for OS-Independent Coding,” Dr. Dobb’s Journal, April 2005, pp 58-63.
3. Zeidman, Bob, "Software synthesis for embedded systems,” Embedded Systems Programming, February 2005, pp 36-43.
4. Zeidman, Bob, "The Future of Programmable Logic,” Embedded Systems Programming, October 2, 2003.
5. Raineault, Ted, “Semaphores Aid Multiprocessor Designs,” Embedded Edge, October 2001.
6. Peterson, J. and Silberschatz, A., Operating System Concepts, Addison-Wesley Publishing Company, 1982.
7. Goering, Richard, “SoC programming models needed, researchers say,” EE Times, July 14, 2005.
8. URL:
9. Cravotta, Robert, “Driving out complexity with abstraction tools,” EDN, June 24, 2004.

Bob Zeidman is the president of Zeidman Technologies , a company that develops tools for embedded systems hardware and software development. Zeidman Technologies has a patent on software synthesis. Bob is the author of the textbooks Verilog Designer's Library, Introduction to Verilog, and Designing with FPGAs and CPLDs. He was the recipient of the 1994 Wyle/EE Times American By Design Award and other engineering, writing, and scholastic awards. Bob holds two patents and he has an MSEE from Stanford University and a BSEE and a BA in physics from Cornell University.

For more information about this topic, go to More about multiprocessing, multicores and tools.

Loading comments...