While real-time operating systems provide apparent concurrency on a
single processor, multiprocessor platforms provide true concurrency.
The concurrency and performance provided by multiprocessors can be very
powerful but also harder to analyze and debug.
The purpose of this series of five articles is (1) to review what is unique about
multiprocessor software as compared to both uniprocessor embedded
systems and general-purpose systems: (2)
to study scheduling and performance analysis of multiple tasks running
on a multiprocessor: (3)
consider middleware and software stacks as well as design techniques
for them; and (4) look at
design verification of multiprocessor systems.
As we move up to software running on embedded multiprocessors, we
face two types of differences. First, how is embedded multiprocessor
software different from traditional, general-purpose multiprocessor
software? We can borrow many techniques from general-purpose computing,
but some of the challenges in embedded computing systems are unique and
require new methods.
Second, how is the software in a multiprocessor different from that
in a uniprocessor-based system? On the one hand, we would hope that we
could port an embedded application from a uni-processor to a
multiprocessor with a minimum of effort, if we use the proper
abstractions to design the software. But there are some important,
fundamental differences.
The first pervasive difference is that embedded multiprocessors are
often heterogeneous, with multiple types of processing elements,
specialized memory systems, and irregular communication systems.
Heterogeneous multiprocessors are less common in general-purpose
computing; they also make life considerably more challenging than in
the embedded uniprocessor world. Heterogeneity presents several types
of problems.
1) Getting
software from several types of processors to work together can present
challenges. Endianness is one common compatibility problem; library
compatibility is another.
2) The
development environments for heterogeneous multiprocessors are often
loosely coupled. Programmers may have a hard time learning all the
tools for all the component processors. It may be hard to debug
problems that span multiple CPU types.
3) Different
processors may offer different types of resources and interfaces to
those resources. Not only does this complicate programming but it also
makes it harder to decide certain things at runtime.
Another important difference is that delays are much harder to
predict in multiprocessors. Delay variations come from several sources:
the true concurrency provided by multiprocessors, the larger size of
multiprocessors, CPU heterogeneity, and the structure and the use of
the memory system.
Larger delays and variances in delays result in many problems,
including:
1) Delay
variations help expose timing-sensitive bugs that can be hard to test
for and even harder to fix. A methodology that avoids timing bugs is
the best way to solve concurrency-related timing problems.
2) Variations in
computation time make it hard to efficiently use system resources and
require more decisions to be made at runtime.
3) Large delays
for memory accesses makes it harder to execute code that performs
data-dependent operations.
Scheduling a multiprocessor is substantially more difficult than
scheduling a uniprocessor. Optimum scheduling algorithms do not exist
for most realistic multiprocessor configurations, so heuristics must be
used. Equally important, the information that one processor needs to
make good scheduling decisions often resides far away on another
processor.
Part of the reason that multiprocessor scheduling is hard is that
communication is no longer free. Even direct signaling on a wire can
take several clock cycles and the memory system may take tens of clock
cycles to respond to a request for a location in a remote memory.
Because information about the state of other processors takes too
long to get, scheduling decisions must be made without full information
about the state of those processors. Long delays also cause problems
for the software processes that execute on top of the operating system.
Of course, low energy and power consumption are important in
multiprocessors, just as in uniprocessors. The solutions to all the
challenges of embedded multiprocessor software must be found so that
energy-efficient techniques can be used.
Many of these problems boil down to resource allocation. Resources
must be allocated dynamically to ensure that they are used efficiently.
Just knowing which resources are available in a multiprocessor is hard
enough.
Determining on-the-fly which resources are available in a
multiprocessor is hard too. Figuring out how to use those resources to
satisfy requests is even harder. As discussed later in this series,
middleware takes up the task of managing system resources across the
multiprocessor.
 |
| Figure
6-1. Kernels in the multiprocessor. |
Real-Time Multiprocessor Operating
Systems
An embedded multiprocessor may or may not have a true multiprocessor
operating system. In many cases, the various processors run their own
operating systems, which communicate to coordinate their activities. In
other cases, a more tightly integrated operating system runs across
several processing elements (PEs).
A simple form of multiprocessor operating system is organized with a
master and one or more slaves. The master PE processor determines the
schedules for itself and all the slave processors. Each slave PE simply
runs the processes assigned to it by the master.
This organization scheme is conceptually simple and easy to
implement. All the information that is needed for scheduling is kept by
the master processor. However, this scheme is better suited to
homogeneous processors that have pools of identical processors.
Figure 6-1 above shows the
organization of a multiprocessor operating system in relation to the
underlying hardware. Each processor has its own kernel, known as the PE
kernel. The kernels are responsible for managing purely local
resources, such as devices that are not visible to other processors,
and implementing the decisions on global resources.
The PE kernel selects the processes to run next and switches
contexts as necessary. But the PE kernel may not decide entirely on its
own which process runs next. It may receive instructions from a kernel
running on another processing element.
The kernel that operates as the master gathers information from the
slave PEs. Based on the current state of the slaves and the processes
that want to run on the slaves, the master PE kernel then issues
commands to the slaves about their schedules. The master PE can also
run its own jobs.
One challenge in designing distributed schedulers is that
communication is not free and any processor that makes scheduling
decisions about other PEs usually will have incomplete information
about the state of that PE. When a kernel schedules its own processor,
it can easily check on the state of that processor.
When a kernel must perform a remote read to check the state of
another processor, the amount of information the kernel requests needs
to be carefully budgeted.
Vercauteren et al. [Ver96] developed a kernel architecture for
custom heterogeneous processors. As shown in Figure 6-2 below, the kernel
architecture includes two layers: a scheduling layer and a
communication layer.
 |
| Figure
6.2. Custom multiprocessor scheduler and communications |
The basic communication operations are implemented by interrupt
service routines (ISRs),
while the communication layer provides more abstract communication
operations.
The communication layer provides two types of communication
services. The kernel channel is used only for kernel-to-kernel
communication - it has high priority and is optimized for performance.
The data channel is used by applications and is more general purpose.
Example: TI's OMAP multiprocessor
configuration
Cconsider the operating systems and communications in
TI's
OMAP. The OMAPI standard
defines some core capabilities for multimedia systems. One of the
things that OMAPI does not define is the operating systems used in the
multiprocessor. The TI OMAP family implements the OMAPI architecture.
The
figure below shows the
lower layers of the TI OMAP, including the hardware and operating
systems.
 |
| Operating
Systems and Communication in the TI OMAP |
The main unifying structure in OMAP is the DSPBridge, which allows
the DSP and RISC processor to communicate. The bridge includes a set of
hardware primitives that are abstracted by a layer of software. The
bridge is organized as a master/slave system in which the ARM is the
master and the C55x is the slave.
This fits the nature of most multimedia applications, where the DSP
is used to efficiently implement certain key functions while the RISC
processor runs the higher levels of the application.
The DSPBridge API implements several functions: it initiates and
controls DSP tasks, exchanges messages with the DSP, streams data to
and from the DSP, and checks the status of the DSP.
The OMAP hardware provides several mailbox primitives - separate
addressable memories that can be accessed by both. In the OMAP 5912,
two of the mailboxes can be written only by the C55x but read by both
it and the ARM, while two can be written only by the ARM and read by
both processors.
Next in Part 2: Multiprocessor
Scheduling
Used
with the permission of the publisher, Newnes/Elsevier, this series of
five articles is based on copyrighted material from "High-Performance
Embedded Computing," by Wayne Wolf. The book can be purchased on
line.
Wayne Wolf is professor of
electrical engineering at Princeton University. Prior to joining
Princeton he was with AT&T Bell Laboratories. He has served as
editor in chief of the ACM Transactions
on Embedded Computing and of Design
Automation for Embedded Systems.
References:
[Ver96] S. Vercauteren, B. Lin, and H. De Man, "A
strategy for real time kernel support in application specific HW/SW
embedded architectures," in Proceedings, 33-rd Design Automation
Conference, ACM Press, 1996, pp. 678 " 682.
For more about multiprocessing
issues on Embedded.com, go to More
On Multicores and Multiprocessing.
To read exerpts from other recent books on embedded hardware and
software, go to More
on The Embedded Bookshelf.