Using an asymmetric multiprocessor model to build hybrid multicore designs -

Using an asymmetric multiprocessor model to build hybrid multicore designs


Multiprocessing programming models have been either inefficient in resource utilization or lack determinism, but a new multiprocessing model is now available that can solve both problems for the right applications.

To meet increasing demand for processing power, chip makers have begun to turn away from the traditional 'wider-larger-faster' approach to scaling and turned, instead, to multi-core processing. These devices implement a distributed processing system on a single chip, using multiple processors to gain performance increases. The trouble is, there has not been an efficient multiprocessing programming model that meets the needs of real-time embedded systems. A new approach, asymmetric multiprocessing (AMP), promises to provide a combination of efficiency and determinism for many applications.

Traditional multiprocessing models have been one of two types. The first, symmetric multiprocessing (SMP), uses algorithms that perform dynamic load balancing by allocating software tasks among a number of identical processors to make maximum use of processor resources (see Figure 1 below ). The load-balancing algorithms have been developed and refined over many years, and the approach works well for numerically-intensive data processing (number-crunching) applications. A single operating system controls all of the processors.

Figure 1 – Symmetric multiprocessing (SMP) does an efficient job of allocating tasks among multiple processors, but partitions all tasks, even time-critical ones, in a non-deterministic manner.

The second approach, popular in cell phone designs, can best be described as “loosely-coupled” (or in some cases, completely uncoupled) multiprocessing. In this approach, system designers assign tasks to the different processors and implement them independently, possibly with different operating systems in each processor. There may be some communications among processors to exchange information and coordinate tasks, but there is no dynamic load balancing. The task assignment is fixed.

Both of these approaches have their advantages and drawbacks. The SMP approach is well established and implemented in popular operating systems such as Linux. These implementations simplify the porting of applications developed for single-processor operating systems to SMP versions. The load balancing algorithms are efficient and make maximum use of available processing power, making them well suited to applications that have a high demand for raw processing power.

Drawbacks to SMP
The SMP approach has several drawbacks, however, that prevent it from being well suited to many embedded applications. The most serious is that SMP behavior is non-deterministic. This means that critical software functions cannot be guaranteed to execute with a guaranteed response time. Functions such as system software, fault response, network management, and outside communications become highly dependent on the system’s current state and load distribution. Without guaranteed response time, the SMP approach does not meet the needs of real-time systems.

Other drawbacks to the SMP approach include poor fault tolerance and limited scalability. The problem with fault tolerance comes about because ALL tasks are assigned to any available processor by the load-balancing algorithms. Implementing an ability for the system to gracefully manage the failure of an individual core is best accomplished by having fault management software dedicated to each core. When fault management is spread around at random, the software may have been allocated to the processor that fails, taking the entire system down. The poor fault tolerance of the SMP approach makes it totally unsuitable for mission-critical applications such as telecommunications.

Scalability is an issue because the overhead of the load balancing algorithms in SMP restricts the amount of performance improvement generated by adding processors. Each additional processor in the system increases the amount of time the algorithm spends assessing load conditions and deciding task assignments, and the increase is not linear. As a result, SMP systems typically reach a break-even point at about 8 processors. Adding more processors then begins degrading performance. Thus, SMP reaches a limit in the performance it can offer.

The SMP approach also cannot be implemented in a heterogeneous system. The algorithms depend on each processor having identical resources available to it, including the operating system it is running, so that tasks can be readily interchanged. Multi-core devices that have different processor cores to handle different types of tasks simply cannot run an SMP operating system, nor can SMP be constructed using different operating systems on each core.

'Loosely-coupled' multiprocessing
The loosely-coupled multiprocessing (see Figure 2 below )approach has one major benefit: it is easy to implement using the existing version of an operating system. Developers simply map their software tasks into the cores, where they stay and run with no further modification. All that is needed is a way for the different cores to exchange information so that tasks can share data and other resources.

Figure 2 – Determinism can be obtained in SMP desigs by pre-allocating tasks to individual processors, but the performance depends on the effectiveness of the mapping and cannot adapt to respond to load increases.

This loose coupling makes very inefficient use of the resources available in a multi-core processor. There is no dynamic load balancing; the software stays where the designer placed it. Thus, one processor may get overwhelmed by some task and not be able to use the resources of another processor that may be running idle. As a result, the performance achievable in a loosely-coupled approach is considerably less than SMP can achieve.

The fixed task mapping also keeps this approach from being fault tolerant. If one processor fails, all tasks assigned to that processor fail. There is no opportunity for shifting tasks to a functional processor.

Given these drawbacks, there are many applications that require increasing processing power that neither traditional approach will serve. One of the most common is telecommunications processing. Such systems need fault tolerance as well as processing performance that can scale with increasing load.

Asymmetric multiprocessing
A new approach being developed promises to provide a third alternative to the two traditional approaches. Called asymmetric multiprocessing (AMP) this approach performs selective load balancing, allowing the developer to permanently assign some tasks to fixed processor resources while allowing others to be load-balanced among many processors (see Figure 3 below ).

Figure 3 – Asymmetric multiprocessing (AMP) combines two approaches, allowing some tasks to be mapped to specific processors while retaining the ability to implement load balancing on the remaining tasks.

There are numerous advantages to this approach. For one, it allows a significant amount of load balancing to occur, although not quite as much as the SMP approach. Unlike the SMP approach, however, AMP is extendable to include any number of processor cores. The load balancing algorithms scale more linearly, allowing each added processor to contribute to running the application.

For embedded systems developers, one of the most significant advantages of AMP is that it guarantees full control of the software’s utilization of system resources such as memory, network connections, I/O ports, and interrupts. This means that the application can be made deterministic in those areas system response is critical. In addition, this level of control allows the system to be constructed for fault tolerance.

The AMP approach shines when used in an application where all of the I/O maps to a single processor. This allows the system to save considerable time in responding to an IRQ because the OS already knows where the request originates. The SMP approach cannot take advantage of this situation; it deliberately knows nothing about individual cores except their load levels.

A further advantage of the control over software mapping is that the approach can be adapted for use in heterogeneous single-chip systems as well as multi-processor systems. If an application can be separated into distinct operating modes, such as the control plane and the date plane processing in a telecommunications system, heterogeneous AMP can be used to separate the tasks among the different processors and still apply load balancing.

The needs of AMP
To implement heterogeneous AMP, an operating system (OS) needs several critical resources. The first is a means of having each of the cores to share the processing load be able to communicate its level of CPU utilization. Rather than having the operating system keep track of the task queue for each processor, as in SMP, heterogeneous AMP has the processor itself track its load level and communicate that to the part of the OS that does the load balancing.

The second critical resource is an ability of the OS to distribute tasks to any of the processors involved. This implies that the OS is able to communicate with the target processor and tell it what task to pick up next. In effect, the heterogeneous AMP OS operates in this way like a distributed operating system.

This yields a hybrid multiprocessing approach that combines some of the advantages of the traditional methods as well as eliminating some of their disadvantages. The heterogeneous AMP OS becomes especially well suited for use in systems that handle large and dynamic traffic flows, such as telecommunications routers. In these cases, system control remains a fairly constant level of effort while the real processing power is needed in bursts. The heterogeneous AMP approach allows the high performance data processing to be distributed while the background system control functions remain stable on one processor running a real-time OS.

The approach does have its drawbacks. The hybrid nature of the load assignments means that developers cannot easily port their applications over from earlier generations of the operating system. They must decide which components need to be fixed and which can be distributed, and separate them accordingly.

A second drawback is that load-balancing remains a technical challenge. While the load-balancing algorithms help during operation, system performance still depends somewhat on the developer making the right choices in the initial mapping of tasks to processors. If that mapping is unbalanced, the OS has only a limited ability to compensate.

Despite these drawbacks, the AMP approach has a high potential for meeting the needs of high-performance, real-time applications. There remain some implementation challenges, but many of the elements of a heterogeneous AMP operating system are already available, including a distributed OS, mechanisms for providing links between diverse operating systems and a lightweight, transparent IPC that offers naming services so that processors can access routines by name even if they are running on other processors.

Only two remaining features are needed: a load-balancing algorithm so that the OS knows where to launch the program, and multiple instantiation of the same task/program across multiple cores. Both the program loading mechanism and the load-balancing algorithm depend heavily on the efficient IPC service.

An AMP OS provides the determinism missing from SMP and supports fault tolerance software that SMP and loosely-coupled approaches cannot. At the same time, AMP provides a scaleable mechanism for handling processing-intensive application tasks with efficient resource utilization. For many applications AMP, once available, will quickly become the approach of choice.

Michael Christofferson is director of product marketing at Enea Embedded Technology Inc.

For more information on this subject go to:
Read all about it: multicores, multiprocessors, everywhere!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.