The multicore SoC - will 2010 be the turning point? - Embedded.com

The multicore SoC – will 2010 be the turning point?

Predicting trends is difficult even by the most connected industry experts, but one trend that's easy to spot is the widespread acceptance of multicore SoC. This is happening for a number of reasons.

First, it's been years since the workstation first adopted the multicore processor architecture to solve such issues as increasing performance and power concerns. While the adoption rate in workstations is now saturated and is fully supported by General Purpose OSs (GPOS), the embedded world is just now looking at ways to adopt multicore architecture.

Second, several SoC vendors have been providing multicore solutions including Cavium, Freescale, MIPS, and ARM; but up until now, these solutions have been limited to networking and used for performance enhancements rather than for low power.

The rest of the embedded industry has had limited hardware options available as low-power design is a driving factor. While the ARM 11 MPCore was ahead of its time, the Cortex-A9 MPCore design is ready for primetime and is gaining acceptance in the embedded marketplace.

As a result, SoC vendors have adopted the Cortex-A9 MPCore hardware as a basis for their next generation designs. Over a year ago, Texas Instruments pre-announced their next-generation OMAP designs in the OMAP 4 with a dual-core Cortex-A9 MPCore, scheduled for production in the second-half of 2010. ST Microsystems has pre-announced their next generation consumer devices which will be based on the Cortex A9 MPCore.

There has also been a shift in consumer electronics to adopt multicore hardware as demand for more processing power and complex user interfaces continue to increase.

So now it's time for the software to step up – no more sitting on the sidelines. It's time to adapt and embrace the multicore hardware options available.

SMP and AMP development
One key concept when addressing multicore is to look at how the hardware is designed. This falls into two categories: Asymmetric Multi Processor (AMP) and Symmetric Multi Processor (SMP).

From the hardware perspective, AMP typically means the cores are architecturally different from one another. Each core can run different instruction sets with a corresponding operating system, or even no operating system at all.

In AMP, an operating system typically executes on a single core with some method (likely proprietary) to communicate between cores. It runs all the devices at its disposal with minimal sharing of resources.

In contrast SMP capable hardware consists of CPU cores that are identical. These cores all service the same events, have the same instruction set, see the same memory, and share the same devices, interrupts, and share a cache coherency unit. This allows for load balancing between cores.

An SMP capable operating system can utilize all the cores at its disposal by scheduling threads and servicing devices and interrupts on any core in the SMP domain.

So is it possible to take advantage of AMP on SMP hardware? The answer is yes, but just because the hardware is capable of true SMP, doesn't mean the best solution is to run all cores in SMP mode. There are optimizations where the system can and should be divided between several operating systems. This is AMP on SMP hardware.

A hybrid approach (Figure 1 below ) may be ideal where the SMP hardware is divided between operating system domains where each domain functions across multiple cores. For example, take a 4-core SMP capable SoC.

Divide the system into two operating system domains (let's say an RTOS and some type of GPOS), where each domain is in charge of two cores. Cores 0 and 1 belong to OS domain 0, and cores 2 and 3 belong to OS domain 1. As long as all operating system instances support both SMP and AMP operation, the most ideal configuration can be realized.

Figure 1: Hybrid AMP/SMP on SMP architecture.

Application designers or system integrators usually run into trouble when they migrate code developed for a single core across several cores in an SMP system. This introduces two important issues: Is your code multicore ready, and can your code take advantage of multiple cores?

Is your code multicore ready?
When preparing to run your code on an SMP scheduler, it's important to consider all priority dependencies that can break your code. There are two main causes of potential problems when running code in a multicore system:

1. Using the master interrupt as a global semaphore. A semaphore is an object used to prevent simultaneous access to a shared resource. However, it is common on single core OSs to use the master interrupt as a “fast” system-wide semaphore. It looks like this:

Disable Interrupts
Access and update the global data structure
Enable Interrupts

You can see from the above pseudo-code that not even an interrupt can execute while the global data structure is being accessed, which works great on single core systems and can be much faster than using a semaphore object for protection.

But therein lies the problem. When there are two cores in operation, and the code is executed, interrupts are disabled only on the core in which it is currently operating , thereby leaving the data structure open for access on other SMP scheduled cores. This race condition will leave the system open to unpredictable results.

2. Relying on the priority to make sure the system is served appropriately. While priority based scheduling works for single core, it can break down on a multicore platform if it's used as a way to guarantee only the highest priority thread is executed.

Once a thread of a lower priority is scheduled on a separate core, this method of blocking two threads from accessing the block of data at the same time turns into another race condition.

One solution is to eliminate global data structure modifications by the highest priority threads in each subsystem. An accepted practice on single core is potentially a deadly race condition on multicore.

Finally, even if your SMP-capable operating system can be scheduled across several cores, ensure that it is partitioned into enough threads to take advantage of all the cores. If not, a potential bottleneck will occur in the system. Can the architecture of the application be modified in such a way as to promote load balancing across multiple cores?

Employing such basic techniques can make your SMP system not only safer, but better optimized for the amount of cores you have available.

What about AMP?
AMP designs can be used on SMP hardware. In fact, this is an ideal relationship between several operating system instances. Executing code simultaneously while dividing code between operating system domains is an effective way to enhance security and increase throughput.

This provides each OS with a deterministic environment with a dedicated cache from which to execute. Ideally, you would dedicate one or more cores to AMP and use an Inter-Processor Communication (IPC) mechanism to communicate between the other OSs.

If you have some non-multicore safe code in your system, you could dedicate a core to a set of threads that must remain in single-core mode and use IPC to communicate between cores. While this may not allow complete load balancing between cores, it does allow for the code to be utilized ” even if it's not ready for multicore.

The ability for the system to bind certain threads of execution to a particular core from within an SMP scheduling environment is available in a technology called Bounded Computational Domains (BCDs), which is a new part of the Nucleus RTOS product from Mentor Graphics.

BCD enables the entire system to interact as a single operating system while making sure only those threads that are bound to the core are scheduled on that core. BCD technology is ideal for legacy applications that may not be ready for SMP but want tight integration with the other tasks in the system.

Most operating systems employ an IPC mechanism to communicate between OS domains. The problem with many of these IPC mechanisms is that they are proprietary, and mixing two different operating systems (like an RTOS and Linux where they have different methods of IPC) can prove quite challenging.

To address the issue of proprietary IPC mechanisms, the Multicore Association created an API-based standard called Multicore Communication API (MCAPI). When both RTOS and GPOS vendors adopt MCAPI, any code written for this API can be ported to another system with all of the IPC code untouched. The good news is MCAPI gives you the ability to migrate code as necessary in order to make the system timing requirements.

Conclusion
Multicore SoCs are quickly becoming the norm in 2010. Operating systems are adapting quickly to hardware capable SMP and AMP designs. Finally, we are seeing the OS as a true enabler of multicore designs.

In addition, operating system vendors are adapting new standards for IPC mechanisms such as MCAPI, which make multi-OS on multicore more of a reality than a trend waiting to happen.

Stephen Olsen is a Software Architect in the Embedded Systems Division at Mentor Graphics Corp. He will be presenting a paper at the Multicore Expo , which is co-located with the Embedded Systems Conference Silicon Valley in San Jose, Ca., from April 26-29, 2010.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.