The Nulticore effect

December 08, 2008

Jack Ganssle-December 08, 2008

My February column in Embedded Systems Design Magazine was an attempt to show that the emperor, at least when talking about multicore technology, has no clothes. Multicore is being hyped as the solution to clock rate stagnation, when it really addresses two problems:

- A handful of "embarrassingly parallel" problems can derive great performance benefits from SMP.
- In many applications one can reduce power consumption by using more processors at slower clock rates.

Actually, there is a third problem that multicore solves: the vendors' need to sell us more transistors as they continue to exploit Moore's Law.

Now a study in IEEE Spectrum  shows that even for the classic embarrassingly parallel problems like weather simulations multicore offers little benefit. The curve in that article is priceless. As the number of cores grow from two to 64 performance plummets by a factor of five. Additional processors nullify each other.

Call it the Nulticore Effect.

One might think that more CPUs equals faster systems, but in traditional symmetrical multiprocessing groups of cores sharing the same memory bus, a bus that even with a single core is already as congested as Highway 101 at rush hour. Memory simply can't keep up with a single-cycle machine that can swallow a couple of instructions per nanosecond.

We all know this; it's the reason a modern processor is crammed full of complex circuits like pipelines and cache. Every access to the bus entails numerous wait states which bring the system to a screeching halt. Add more cores, all demanding access to that same bus, and system performance is bound to drop.

Other problems surface. We know that absent scheduling algorithms like RMA (rate monotonic analysis) - which itself is highly problematic - preemptive multitasking is not deterministic. Though most embedded systems use preemptive multitasking, there's no way to insure the system won't fail from a perfect storm of interrupts and task switches.

And it's hard - really hard in a complex system - to get multitasking right. Add in multiple cores, each of which is constantly blocking the others from memory, and determinism looks about as likely as every school kid's plan to become an NBA star.

Reentrantly sharing memory is tough enough with a single processor; when many share the same data the demands on developers to produce perfectly locked and reentrant code become overwhelming.

Then there's the little issue of parallelizing programs, an unsolved problem that is to supercomputing what the holy grail is to the Knights Templar - plenty of rumors, lots of speculation, but no hard results.

There are a lot of smart people working on these problems and I've no doubt they will be solved at some point. But today a generally better approach is asymmetric multiprocessing, where each core has its own memory space. More on that later.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at His website is

Loading comments...