By David N. Kleidermacher, Green Hills Software
Multicore is a hot topic. Half of all embedded designs have multiple
processors, and 10% of embedded designs have multiple cores on a
single-chip. This percentage is slowly but surely increasing. In the
same way that it is difficult to find single core devices on the
desktop, it is only a matter of time before the same will be true in
embedded systems.
As designers have begun to adopt multicore designs, much press
has been given to the challenges posed to software developers. In fact,
it has become quite fashionable for industry pundits to wax histrionic
about the ills that must be endured by software developers who have
been launched into a new hardware world without the proper software
tools and ecosystem.
While many of the popular complaints are fiction, multicore software
development does, in fact, pose some serious challenges. In this
article, we will try to separate fact from fiction as we discuss a few
of the key issues on the table today.
1. Refactoring embedded software to
achieve concurrency is major challenge.
FICTION. It turns out that most embedded systems are already quite
heavily multithreaded. It is common for embedded developers to employ
real-time operating systems, and every RTOS in the world has some form
of threading primitive. Embedded designers use threads as a method to
simplify the management of the independent functions in the system.
On a unicore system, threads are logically concurrent, with the
operating system applying core processing power to each thread in turn.
On a multicore processor, these threads are naturally and truly
concurrent, usually with no change in the software required (assuming
an symmetic multiprocssor-capable
RTOS).
Furthermore, as embedded systems have grown in complexity, adding a
variety of connectivity and multimedia functions, components map
naturally to threads. If the device embeds a web server, this web
server uses one or more threads to serve connection requests. If the
device has a file system, files are served by a number of file server
threads.
Audio frameworks run as threads. CORBA and other connectivity
solutions use threads. As systems designers pile on more and more
applications and middleware, the number of threads increases, enabling
the system to take immediate advantage of additional cores.
Of course, not all systems make optimal use of all the hardware
cores. Designers may indeed want to increase concurrency by refactoring
the code.
2. When refactoring software,
maximize threads while minimizing processes.
FICTION. There are many ways to unlock concurrency, but coarse grained
parallelism (decomposing software into large sized pieces that are
mapped to threads and/or processes) is arguably the most ubiquitous,
portable, and effective.
Yet when deciding whether to map a new component to a thread
(sharing memory space with other threads) or a process, most designers
opt for the poorer choice of threads. In fact, until recently, many
embedded systems were characterized by large numbers of threads, all
sharing the same memory space.
The reason for this can be attributed to the fact that the most
popular RTOSes used in the 80s and early 90s did not support memory
protection. Developers became accustomed to threads, and their legacy
code lives on.
Of course, modern RTOSes today support memory protected processes.
And while the cost (in terms of memory use and context switching time)
of a process may be a bit higher than a thread, that cost has reached
the threshold of negligibility with today's fast processor cores and
memory architectures.
In fact, designers should strive for a 1 to 1 ratio between threads
and processes. In other words, each memory protected component should
have only a single thread of execution. Of course it will not always be
possible to reach this ideal, but designers should strive to minimize
threads in each component, particularly in new code.
Whenever possible, each component should be owned by a single
developer, with clear, well-defined, message-based interfaces between
components. This component management philosophy minimizes unforeseen
interactions and some of the nastier multithreading problems that arise
when software uses many threads, synchronized with mutexes and other
error-prone constructs. Managing multithreaded components is simply
more difficult, even with the best visualization and thread-aware
debugging tools.
Regardless of whether threads or processes are used, an SMP-capable
operating system will automatically schedule the components onto the
available cores. It is this automatic load balancing that is one of the
most important efficiencies realized by moving to a multicore platform.