Techniques for Designing Energy-Aware MPSoCs " Part 1 -

Techniques for Designing Energy-Aware MPSoCs ” Part 1

Power and energy consumption have become significant constraints inmodern day microprocessor systems. Whereas energy-aware design isobviously crucial for battery-operated mobile and embedded systems, itis also important for desktop and server systems due to packaging andcooling requirements.

In such systems, power consumption has grown from a few watts perchip to over 100 watts. As desktop/server (and even embedded) systemsevolve from the uniprocessor space to the multiprocessor system-on-chips (MPSoCs)space, energy-aware design will take on new dimensions.

Techniques for energy and power consumption reduction have beensuccessfully applied at all levels of the design space in uniprocessorsystems: circuit, logic gate, functional unit, processor, systemsoftware, and application software levels.

The primary focus has been on reducing active (dynamic) power. Astechnology continues to scale up accompanied by reductions in thesupply and threshold voltages, the percentage of the power budget dueto standby (leakage) energy has driven the development of additionaltechniques for reducing standby energy as well [9,10].

Figure2.1. Components of power consumption. Is-c, short-circuitcurrent; Ioff, leakage current; Idyn, dynamic switching power.

Figure 2-1 above illustrates the components ofpower consumption in a simple CMOS circuit. In well-designed circuits, Is-c is a fixed percentage (less than 10%) of Ion . Thus, activepower consumption is usually estimated as:

Pact = Cavg V2 dd (Act) fclock

where Cavg is average capacitive load of a transistor, Vdd is thepower supply, fclock is the operating frequency, and Act is theactivity factor that accounts for the number of devices that areactually switching (drawing current from the power supply) at a giventime.

This definition of active power consumption illustrates thesignificant benefit of supply voltage scaling—a quadratic reduction inactive power consumption. However, it is important for high performancethat VDD + 3VT so that there is sufficient current drive.

Thus, as supply voltages scale, threshold voltages must also scale,causing leakage power to increase. As an example, the leakage currentincreases from 20 pico Amperes per micrometer when using a TaiwanSemiconductor Manufacturing Corporation (TSMC) CL018G process with athreshold voltage of 0.42V 0.25V. Standby power consumption due tosubthreshold leakage current is estimated as:

Pleak = Vdd Ioff KL

where Iof f is the current that flows between the supply rails inthe absence of switching and KL is a factor that accounts for thedistribution/sizing of P and N devices, the stacking effect, theidleness of the device, and the design style used.

In a CMOS-based style, at most half of the transistors are actuallyleaking whereas the remaining are ON (in the resistive region). Asoxide thicknesses decrease, gate tunneling leakage will also become asignificant source of standby power. This component of standby power isnot included in the above equation and is not a target of thetechniques presented in this chapter.

This series of articles surveys a number of energy-aware designtechniques for controlling both active and standby power consumptionthat are applicable to the MPSoC design space and points to emergingareas that will need special attention in the future. Our model MPSoCsystem is depicted in Figure 2-2 below .

Figure2-2. Model MPSoC. I$, instruction cache; D$, data cache

Energy-aware processor design
Many techniques for controlling both active and standby powerconsumption have been developed for processor cores [11]. Almost all ofthese will transition to the MPSoC design space, with some of themtaking on additional importance and expanded design dimensions. Figure 2-3 below shows the processorpower design space for both active and standby power [12].

Figure2-3. Processor power design space. DFS, dynamic frequency scaling; DVS,dynamic voltage scaling; DTM, dynamic thermal management.

The second column lists techniques that are applied at design timeand thus are part of the circuit fabric. The last two columns listtechniques that are applied at run time, the middle column for thosecases when the component is idle, and the last column when thecomponent is in active use. (Run time techniques can additionally bepartitioned into those that reduce leakage current while retainingstate and those that are state-destroying.)

The underlying circuit fabric must provide the “knobs'' forcontrolling the run time mechanisms – by either the hardware (e.g., inthe case of clock gating) or the system software (e.g., in the case ofdynamic voltage scaling [DVS]).

In the case of software control, the cost of transitioning from onestate to another (in terms of both energy and time) and the relativeenergy savings need to be provided to the software for decision making.

Reducing Active Energy
As has already been pointed out, the best knob for controlling power issetting the supply voltage appropriately to meet the computational loadrequirements. Although lowering the supply voltage has a quadraticimpact on active energy, it decreases systems performance since itincreases gate delay [4], as shown in Figure2-4 below .

Figure2-4. Delay as a function of VDD.

Multiple supply voltages can be used very effectively in MPSoCssince they contain multiple processors of different types withdifferent performance requirements (e.g., as in Fig. 2-2: an MPSoC willcontain high-speed digital signal processors (DSPs), moderate-speedprocessor cores, and low-speed I/O processors [IOPs]).

Choosing the appropriate supply voltage at design time for theentire component will minimize (or even eliminate) the overhead oflevel converters that are needed whenever a module at a lower supplydrives a module at a higher supply [13].

The most popular of the techniques for reducing active powerconsumption at run time is DVS combined with dynamic frequency scaling(DFS). Most embedded and mobile processors contain this feature(triggered by thermal on-chip sensors when thermal limits are beingapproached or by the run-time system when the CPU load changes).

DFS + DVS requires a power supply control loop containing a buckconverter to adjust the supply voltage and a programmable Phase lockedloop (PLL) to adjust the clock [10].

As long as the supply voltage is increased before increasing theclock rate or decreased after decreasing the clock rate, the systemonly need stall when the PLL is relocking on the new clock rate(estimated to be around 20msec).

Future MPSoCs in which all (or selected) processing cores are DFS +DVS capable would require each to have its own converter and PLL (orring oscillator) as well as system software capable of determining theoptimum setting for each core, taking into account the computationalload.

The overall system will have to be designed such that cores aretolerant of periodic dropouts of other cores that are stalling whentransitioning clock rates. An additional complication is that thevoltage converter and PLL, being analog components, are susceptible tosubstrate noise induced by digital switching.

Thus, uniprocessor systems typically distance the digital and analogcomponents as much as possible to reduce this effect. Whether thistechnique will carry over to MPSoCs, in which the ability to separatethe analog and digital components physically becomes much more complex,remains an open question.

Reducing Standby Energy
Several techniques have recently evolved for controlling subthresholdcurrent, as shown in Figure 2-3 earlier. Since increasing the thresholdvoltage, VT, decreases subthreshold leakage current (exponentially),adjusting VT is one such technique.

Figure2-5. VT effects.

As shown in Figure 2-5 above ,a 90-mV reduction in VT increases leakage by an order of magnitude.Unfortunately, increasing VT also negatively impacts gate delay. Aswith multiple supply voltages, multiple threshold voltages can beemployed at design time or run time.

At run time, multiple levels of VT can be provided using adaptivebody-biasing whereby a negative bias on VSB increases VT [11], as shownin Figure 2-5 .

Simultaneous DVS, DFS, and variable VT has been shown to be aneffective way to trade off supply voltage and body biasing to reducetotal power – both active and standby – under variable processor loads[14].

Once again, whether similar techniques will carry over to MPSoCsremains an open question. Another technique that will apply to reducingstandby power in MPSoC's is the use of sleep transistors like thoseshown in Figure 2-6 below .

Figure 2-6.Gating supply rails.

Standby power can be greatly reduced by gating the supply rails foridle components. In normal mode (non-idle), the sleep transistors mustpresent as small a resistance as possible (via sizing) so as not tonegatively affect performance.

In sleep mode (idle), the transistor stack effect [10] reducesleakage by orders of magnitude. Alternatively, standby power can becompletely eliminated by switching off the supply to idle components.

An MPSoC employing such a technique will require system softwarethat can determine the optimal scheduling of tasks on cores and candirect idle cores to switch off their supplies while taking intoaccount the cost (in terms of both energy and time) of transitioningfrom the on-to-off and off-to-on states.

Next in Part 2: Energy-AwareMemory Design

This series of articles is based oncopyrighted material submitted by Mary Jane Irwin, Luca Beni, N.Vijaykrishnan and Mahmut Kandemir to “MultiprocessorSystems-On-Chips,” edited by Wayne Wolf and Ahmed Amine Jerraya. Itis used with the permission of the publisher, Morgan Kaufmann, animprint of Elsevier. The book can be purchased on-line.

Mary Jane Irwin is the A.Robert Noll Chair in Engineering in the Department of Computer Scienceand Engineering at Pennsylvania State University. Luca Benini is professor at theDepartment of Electrical Engineering and Computer Science at theUniversity of Bologna in Italy. N.Vijaykrishnan is an associate professor, and Mahmut Kandemir is an assistantprofessor in the Computer Science and Engineering Department atPennsylvania State University.

Ahmed Jerraya is researchdirector with CNRS and is currently managing research on multiprocessorsystem-on-chips at TIMA Laboratory in France. Wayne Wolf is currently the GeorgiaResearch Alliance Eminent Scholar holding the Rhesa “Ray” S. Farmer,Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech'sSchool of Electrical and Computer Engineering (ECE). Previously aprofessor of electrical engineering at Princeton University, he workedat AT&T Bell Laboratories.

[4] Haskell, B.G.,;”Digital Video, An Introduction to MPEG-2;” Kluwer Academic Publishers,Boston, 1996.
[9] Kuroda, T., “Optimizationand control of VDD and VT for low power, high speed CMOS design;”International Conference on Computer Aided Design, November, 2002.
[10] Duarte, D., et. al.”Evaluating run-time techniques for leakage power reduction;:Asia-Pacific Design Automation Confernce, January, 2001.
[11] Broderson R., et. al.; “Methodsfor true power minimization;” International Conference on ComputerAided Design; November, 2002.
[12] Rabaey, J., et. al.”Digital Integrated Circuits: A Design Perspective;” Prentice Hall,2003.
[13] Usami, K.,”Clustered voltage scaling techniques for low power design;”International Symposium on Low Power Electronics and Design;” April,1995.
[14] Martin, S., et. al. “CombinedDynamic Voltage Scaling and adaptive body biasing for low powermicroprocessors under dynamic workloads;” ICCAD, November, 2002.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.