As Microsoft's Herb Sutter has stated at various occasions – the freelunch is over. On paper the hardware performance improvements continueas normal. To do this, hardware designers, who ran into energyconsumption issues a while back, came up with a simple solution andsuccessfully implemented it: instead of increasing clock speed theyincreased the number of processors.
As a result hardware design can now deliver increased performancebased on original roadmap, on first sight satisfying the everincreasing appetite of consumers for more features, more performance atlower cost and lower power consumption.
However, when trying to program these devices, things don't quiteadd up. Existing, sequential software is unable to unleash theincreased performance that the hardware devices offer. Placing softwaredevelopment at a crossroads.
If hardware designers are not able to provide appropriate softwaredevelopment environments that support their devices, the future looksgrim. They won't sell any! Without appropriate multiprocessor softwaredevelopment environments programmers will be left out in the cold andwill not be able to leverage additional performance offered by Multiprocessor System-onChips (MPSoCs )
Why the free lunch really is over
On a recent panel at the “Globalpress Summit” conference, executiveswere quoted with alarming statements on the state of the softwareindustry. Steve Roddy, v-p at Tensilica, said: “Some say we're at a crisis stage with thesoftware side overwhelming the hardware side .”
Driving some of this is the proliferation of cores in system-on-chip(SoC) devices. Charting processor properties over the last threedecades shows that the performance of a single processor has leveledoff in the last decade.
Moreover, the efficiency of built-in Instruction Level Parallelism (ILP),which is effectively shielding software designers from having to thinkabout parallelism, is no longer growing. As a result the attemptedincreases in performance have reached dangerously close to the powerconsumption ceiling, essentially stopping performance progress inconventional processor development.
The move to Multiprocessor System on Chip (MPSoC) design elegantlyaddresses the power issues faced on the hardware side, creatingmultiple processors that execute at lower frequency, resulting incomparable overall MIPS performance, while allowing designers to slowdown the clock speed, a major constraint for low power design. However,such a switch means the challenges have effectively been moved from thehardware to the software domain.
The ITRS 2006 Update (see ITRS, “System Drivers”, 2006 Update) ispredicting the average number of main processors per chip to grow froma few in 2007 to about 40 in 2020, with each of the main processors onaverage interacting with up to 8 data processing engines.
This puts the total number of processing engines in 2020 well over250. This trend can already clearly be seen in the evolution of Systemson Chips (SoCs) as illustrated in Figure1 below .
MPSoCs today and tomorrow
The traditional SoC was dominated by hardware as illustrated in Figure 1 below on the upper leftside. The functionality of an application was mapped into individualhardware blocks.
The insatiable consumer demand for more features in every newproduct generation has driven a trend to keep flexibility duringproduct design and even during product lifetime by using programmableprocessors on chip. With the advent of licensable processors asprovided by ARM and MIPSTechnologies programmable SoCs became possible and have foundtheir entry into every type of embedded applications from cell phonesand multimedia set top boxes to automotive applications.
|Figure1 ” Evolution from traditional Systems on Chips (SoCs) to MPSoCs|
As a result even more functions which traditionally were optimizedin specific hardware blocks are now run on processors, leading to MultiProcessor Systems on Chips (MPSoCs) organized in “silos” as indicated in Figure 1c .
The silo organization lends itself very well to situations in whichthe different portions of a sequential algorithm can be mapped toindividual, specific processors. A good example is a printer design asdescribed by Tensilicaand Epson, in which Epson's engineers added Tensilica InstructionExtensions (TIE) to customize several different Xtensa LX processors,each for a unique step in the inkjet image processing chain.
However, processor based design using functional silos – with somelimited communication between the different functions and processors -does not scale well with the demands for even more performance inconsumer designs because it does not address the fundamental issues ofpower consumption at higher clock speeds and limited additionalimprovements using ILP.
The next natural step is illustrated in Figure 1d , in which software is nowdistributed across processors, causing increased communication betweenthem.
With that step the design challenges are now effectively moved intothe software domain. The years of “free” increased performance offeredby processors ” the free lunch ” is over.
While the hardware design of multiple processors on a single die isreasonably well understood, software programmers are now facingdaunting questions around programming, debugging, simulation, andoptimization.
The components of an MPSoCprogramming environment
When talking to software programmers the first response at this pointis that “compilers will take care of this”. Not so. As Prof. Ed Leeargues in TheProblem with Threads: “manyresearchers agree that  automatic techniques have been pushed aboutas far as they will go, and that they are capable of exploiting onlymodest parallelism. A natural conclusion is that programs themselvesmust become more concurrent .”
In essence this means that in order to unleash the additionalperformance offered by MPSoCs, software programmers will actually haveto explicitly point software design automation solutions withprogramming models to the parallelism and communication inherent in thealgorithms to be partitioned across processors.
The choice, adoption and standardization of programming models willbe a key trigger to the move of MPSoCs into mainstream computing.Several programming models have been analyzed in projects under the MESCAL research program,including some dedicated to the INTEL IXP family of network processorsand some as a subset of the MPI (Message Passing Interface)standard. Other programming models focused on High PerformanceComputing are OpenMP and HPF .
In the SoC world ST Microelectronics research is reporting on aproject called MultiFlex, which is working to align more with the POSIX standard and CORBA, which has also beenstandardized on by the US DoDfor future radios.
NXP has been presenting an abstract task-level interface named TTL (“ System-LevelDesign Flow Based on a Functional Reference for HW and SW“, Walter Tibboel et al, DAC 2007 ),following earlier work on YAPI. Another DSP focused programming modelis called StreamIt, and even SystemC(www.systemc.org) with its concepts of channels and ports could beviewed as a software programming model, but its adoption for softwaredesign is open to question.
Besides the choice of programming model, an MPSoC programmingenvironment needs to interface to flexible execution environments foranalysis, debug and verification.
In the majority of projects today, once the real hardware isavailable, the verification of software is finished by connectingsingle core focused debugger via JTAG to development boards. Sometimesprototype boards are used in which FPGAs represent the ASIC or ASSPcurrently under development.
More recently, designers have been able to use virtual prototypesutilizing simulation of the processor and its peripherals either insoftware or using dedicated hardware accelerators. All these techniqueshave different advantages and disadvantages.
Software verification on real hardware is only available late in thedesign flow and offers limited ability to 'see' into the hardware. Thisapproach does not normally take into account turn around time in caseswhen defects are found that can only be fixed with a hardware change.
Prototype boards are available earlier than the real hardware butrequire the design team to maintain several code bases of the design “one for the FPGAs used as a prototype and one for the real ASIC/ASSPused later. This approach also makes it difficult to achieve propervisibility into the hardware design to enable efficient debug.
Virtual prototypes either in software or using hardware accelerationare available earliest in the design flow and offer the best in'seeing' into the design, but often represent an abstraction and assuch are not “the real thing”.
This approach runs the risk that either defects are found which donot exist in the real implementation, or defects of the realimplementation are not found because the more abstract representationdid not allow it.
Within this category there are significant differences between thetime when the virtual prototypes become available and their speed.Often abstract software processor models can be available long beforeRTL is verified and can be reasonably fast (in the order of 10s ofMIPS).
However, users typically pay for this advantage by having tosacrifice some accuracy of the model. When cycle accuracy is requiredmodels typically are available not long before the RTL, in which casehardware assisted methods such as emulation become a feasiblealternative.
A fundamental shortcoming of current solutions is that they aresingle core focused. The most pressing issues in systems runningparallel software on parallel hardware require new techniques ofanalysis and debug.
Users are facing both issues of functional correctness andperformance. Data races, stalls, deadlocks, false sharing and memorycorruption keep developers of MP software awake at night.
Another key aspect a programming environment has to offer both MPSoCdesigners and MPSoC users is the ability to rapidly program a varietyof different combinations of parallel software that run on parallelhardware.
For automation to be possible, it is essential that the descriptionsof the application functionality and hardware topology be independentof each other, and that a user has the ability to define differentcombinations using a mapping of parallel software to parallel hardware.
|Figure2 ” MPSoC Application Optimization|
Figure 2 above shows theperformance optimization of a video scaler application across a fourprocessor MPSoC platform.A programming model to express parallelism,communication and in an execution environment that gathers performancedata requires a description of the software architecture.
If a method in which the communication structures are separated fromthe tasks is used, then a coordination language to describe thetopology is required.
Furthermore, a description of the hardware architecture topology isrequired, which would allow a mapping to define which elements of thesoftware are to be executed on which resources in the hardware andwhich hardware/software communication mechanisms are to be used forcommunication between software elements.
In the hardware world, the topology of architectures can beelegantly defined using XML based descriptions as defined in IP-XACT.In the software world, several techniques exist to express the topologyof software architectures.
The software crisis can be addressed by switching to parallel hardwarewith multiple processors. The inevitable switch to multi processordesigns will cause a fundamental shift in design methodologies.However, the effects this switch will have on software programming andthe ability of MPSoC designers and users to interact efficiently usingdesign automation are not yet well understood and are likely to spawn anew generation of System Design Automation tools.
Simon Davidmann is chiefexecutive officer at Imperas.