Needed: clear thinking about multithreading and multi-core - Embedded.com

Needed: clear thinking about multithreading and multi-core

Is multithreading better than multi-core? Is multi-core better thanmultithreading? One might as well ask whether a diesel engine is betterthan four-wheel drive! The best vehicle for a given application mighthave one, the other, or both. Or neither. They are independent ” butcomplementary ” design decisions.

With multithreaded processors and multi-core chips becoming thenorm, architects and designers of digital systems need to understandtheir respective attributes, advantages, and disadvantages.

Tapping Concurrency as a Resource
What multithreading and multi-core have in common is that they bothexploit the concurrencyin a computational workload. The cost, in silicon, energy, andcomplexity, of making a CPU run a single instruction stream ever fastergoes up non-linearly, and eventually hits a wall imposed by thephysical limitations of circuit technology.

That wall keeps moving out a little further every year, but cost andpower-sensitive designs are constrained to follow the bleeding edge ata safe distance. Fortunately, virtually all computer applications havesome degree of concurrency: At least some of the time, there are two ormore independent tasks that need to be performed. Taking advantage ofconcurrency to improve computing performance and efficiency isn'talways trivial, but it's certainly easier than violating the laws ofphysics.

Multi-processor, or multi-core, systems exploit concurrency tospread work around a system. As many software tasks can run at the sametime as there are processors in the system. This can be used to improveabsolute performance, cost, or power/performance. Clearly, once one hasbuilt the fastest single processor possible in a given technology, theonly way to get even more compute power is to use more than one ofthem.

More subtly, if a load that would saturate a 1GHz processor could beevenly spread across 4 processors, those processors could be run atroughly 250MHz each. If each 250MHz processor is less than ΒΌ thesize of the 1GHz processor, or consumes less than one-fourth the power,either of which may be the case because of the non-linear cost ofhigher operating frequencies, the multi-core system might be moreeconomical.

Many designers of embedded SoCs are already exploiting concurrencywith multiple cores. Unlike general-purpose workstations and servers,whose workload is variable and unknowable to system designers, it'soften possible to analyze and decompose a fixed set of embedded devicefunctions into specialized tasks, and assign tasks across multipleprocessors, each of which has a specific responsibility, and each ofwhich can be specified and configured optimally for that specific job.

Multithreadedprocessors also exploitthe concurrency of multiple tasks, but in a different way, and for adifferent reason. Instead of a system-level technique to spread CPUload, multithreading is a processor-level optimization to improve areaand energy efficiency.

Multithreaded architecture is driven to a large degree by theobservation that single-threaded high-performance processors spend asurprising amount of time doing nothing. When the results of a memoryaccess are required for a program to advance, and that access mustreference RAM whose cycle time is tens of times slower than that of theprocessor, a single-threaded processor is condemned to stall until thedata is returned.

The multithreading hypothesis can be stated as: If latencies preventa single task from keeping a processor pipeline busy, then a singlepipeline should be able to complete more than one concurrent task inless time than it would take to run the tasks serially. This meansrunning more than one task's instruction stream, or thread, at a time,which in turn means that the processor has to have more than oneprogram counter, and more than one set of programmable registers.

Replicating those resources is far less costly than replicating anentire processor. In the MIPS32 34K processor, for example, whichimplements the MIPSMT multithreading architecture, an additional 14% of area can buyan additional 60% of throughput, relative to a comparablesingle-threaded core. (Measured using the EEMBC PKFLOW and OSPFbenchmarks, run sequentially on a MIPS32 24KE core versus concurrentlyon a dual-threaded MIPS32 34K core.)

Multi-processor architectures are infinitely scalable, in theory.However many processors one has, one can always imagine adding another,though only a limited class of problems can make practical use ofthousands of CPUs. Each additional processor core on an SoC adds to thearea of the chip at least as much as it adds to the performance.

Multithreading a single processor can only improve performance up tothe level where the execution units are saturated. However, up to thatlimit, it can provide a “superlinear” payback for the investment in diesize.

While the means and the motives are different, multi-core systemsand multithreaded cores have a common requirement that concurrency inthe workload be expressed explicitly by software. If the system hasalready been coded in terms of multiple tasks running on amulti-tasking OS, there may be no more work to be done.

Monolithic, single-threaded applications need to be reworked anddecomposed either into sub-programs or explicit software threads. Thiswork must be done for both multithreaded and multi-core systems, andonce completed, either can exploit the exposed concurrency – anotherreason why the two techniques are often confused, and something thatmakes them highly complementary.

When is Multi-core a Good Idea?
For embedded SoC designs, a multi-core designmakes the most sense when the functions of the SoC decompose cleanlyinto subsystems with a limited need for communication and coordinationbetween them.

Instead of running all code on a single, large, high-frequency core,connected to a single, large, high-bandwidth memory, assigning tasks toseveral simpler, slower cores allows code and data can be stored inper-processor memories, each of which has both a lower requirements forcapacity and bandwidth. That normally translates into power savings,and potentially in area savings as well, if the lower bandwidthrequirement allows for physically smaller RAM cells to be used.

If the concurrent functions of an SoC cannot be staticallydecomposed at system design time, an alternative approach is to emulategeneral-purpose computers and build a coherent SMP cluster of processorcores. Within such a cluster, multiple processors are available as apool to run the available tasks, which are assigned to processors “onthe fly”.

The price to be paid for this flexibility is that it requires asophisticated interconnect between the cores and a shared main memory,and the shared main memory needs to be relatively large andhigh-bandwidth. This negates the area and power advantages alluded toabove for functionally partitioned multi-core systems, but can still bea good trade-off.

Every core represents additional die area, and even in a “powereddown” standby state, each core in a multi-core configuration consumessome amount of leakage current, so the number of cores in an SoC designshould in general be kept to the minimum necessary to run the targetapplication. There is no point in building a multi-core design if theproblem can be handled by a single core within the system's designconstraints.

When is Multithreading a Good Idea?
Multithreading makes sense whenever an application with some degree ofconcurrency is to be run on a processor that would otherwise finditself stalled a significant portion of the time waiting forinstructions and operands. This is a function of core frequency, memorytechnology, and program memory reference behavior.

Well-behaved real-world programs in a typical single-threaded SoCprocessor/ memory environment might be stalled as little as 30% of thetime at 500MHz, but less cache-friendly codes may be stalled a whopping75% of the time in the same environment. Systems where the speeds ofprocessor and memory are so well matched that there is no loss ofefficiency due to latency will not get any significant bandwidthimprovement from multithreading.

Going Beyond Multi-Core
The additional resources of a multithreaded processor can be used forother things than simply recovering lost bandwidth, if themultithreading architecture provides for it. A multithreaded processorcan thus have capabilities that have no equivalent in a multi-coresystem based on conventional processors.

For example, in a conventional processor, when an external interruptevent needs to be serviced, the processor takes an interrupt exception,where instruction fetch and execution suddenly restarts at an exceptionvector. Interrupt vector code must save the current program statebefore invoking the interrupt service code, and must restore theprogram context before returning from the exception.

A multithreaded processor, by definition, can switch between twoprogram contexts in hardware, without the need for decoding anexception or saving/restoring state in software. A multithreadedarchitecture targeted for real-time applications can potentiallyexploit this and allow for threads of execution to be suspended, thenunblocked directly by external signals to the core, providing forzero-latency handling of interrupt events.

Multithreaded, Multi-core: The Bestof Both World s
Arguably, from the standpoint of area and energy efficiency, theoptimal SoC processor solution would be to use multithreaded cores asbasic processing elements, and to replicate them in a multi-coreconfiguration if the application demands more performance than a singlecore can provide.

Kevin D. Kissellis Principal Architect, MIPSTechnologies Inc .

To learn more about thi topic, go to Moreabout multicores and multithreading.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.