Techniques for Designing Energy-Aware MPSoCs: Part 3 - Embedded.com

Techniques for Designing Energy-Aware MPSoCs: Part 3

Computation and storage energy directly benefit from device scaling(smaller gates, smaller memory cells), but unfortunately the energy forglobal communication does not scale down.

Projections assuming aggressive physical and circuit leveloptimizations for global wires [42] show that global communication onchip will require increasingly higher energy, the majority of it beingactive energy.

Hence, communication-related energy is a significant concern infuture MPSoC systems that will create many new challenges that have not beenaddressed in traditional high-performance on-chip interconnect design.

In this section we will explore various communication energyminimization approaches, moving from simple architectures (sharedbusses and point-to-point links), to complex advanced interconnects (networks on-chip [NoCs]) thatare applicable to MPSoCs.

The simplest on-chip communication channel is a bundle of wires,often called a bus. To be more precise, however, a bus can besignificantly more complex than a bundle of wires. On-chip busses arehighly shared communication infrastructures, with complex logic blocksfor arbitration and interfacing. In an effort to maintain consistencywith the surveyed literature, in the following we will use the term''bus'' just as a synonym for a set of bundled wires.

It is, however, important to remember that implementing encodingschemes in real-life designs requires maintaining compliance withcomplex SoC bus protocols. This is not an easy task, as outlined inOsborne et al. [43].

Bus Encoding for Low Power
Bus data communication has traditionally adopted straightforwardencodings and modulation. The transfer function of the encoder andmodulator of binary non-return-to-zero input streams was implicitlyassumed to be unitary. In other words, no modulation and encoding wereapplied to the binary data before sending it on the bus. Low-energycommunication techniques remove this implicit assumption, byintroducing the concept of signal coding for low power.

The basic idea behind these approaches is to encode data sentthrough the communication channel to minimize its average switchingactivity, which is proportional to dynamic power consumption in CMOStechnology. Ramprasad et al. [44] studied the data encoding for minimumswitching activity problem and obtained upper and lower bounds ontransition activity reduction for any encoding algorithm.

This important theoretical result can be summarized as follows: thesavings obtainable by encoding depend on the entropy rate1 of the datasource and on the amount of redundancy in the code. The higher theentropy rate, the lower the energy savings that can be obtained byencoding with a given code redundancy.

Even though the work by Ramprasad et al. [44] provides a theoreticalframework for analyzing encoding algorithms, it does not providegeneral techniques for obtaining effective encoders and decoders. Thecomplexity and energy cost of encoding and decoding circuits must betaken into account when evaluating any bus-encoding scheme.

Several authors have proposed low-transition activity encoding anddecoding schemes. To illustrate the characteristics of these schemes ina simple setting, we consider a point-to-point, one-directional busconnecting two modules (e.g., a processor and its memory), as shown in Figure 2-10 below .

Figure2-10. Bus encoding: one-directional communication.

Data from the source module are encoded, transmitted on the bus, anddecoded at the destination. A practical instance of this configurationis the address bus for the processor/memory system. If the bus has verylarge parasitic capacitance, the energy dissipated in driving the linesduring signal transitions dominates the total energy cost ofcommunication (including the cost of encoding and decoding). However,as discussed in the following, this assumption has to be carefullyvalidated whenever a new encoding scheme is proposed.

Encoding for Random White Noise
A few encoding schemes have been studied starting from the assumptionthat the data sent on the bus can be modeled as random white noise (RWN), i.e.,having a maximum entropy rate. Under this assumption, it is possible tofocus solely on how to exploit redundancy to decrease switchingactivity, because all irredundant codes have the same switchingactivity (this result is proved in the paper by Ramprasad et al. [44]).Data are formatted in words of equal width, and a single word istransmitted every clock cycle.

In its simplest form, redundant encoding requires adding redundantwires to the bus. This can be seen as extending the word's width by oneor more redundant bits. These bits inform the receiver about how thedata were encoded before the transmission (see Fig. 2-10 earlier.)

Low-energy encodings exploit the correlation between the wordcurrently being sent and the previously transmitted one. The rationaleis that energy consumption is related to the number of switching lines,i.e., to the Hamming distance between the words. Thus, transmittingidentical words will consume no power, but alternating a word and itscomplement would produce the largest power dissipation, because all buslines would be switching.

A conceptually simple and powerful scheme was proposed by Stan andBurleson [45] and called bus invert (BI) encoding. To reduce theswitching, the transmitter computes the Hamming distance between theword to be sent and the previously transmitted one. If the distance islarger than half the word width, the word to be transmitted isinverted, i.e., complemented. An additional wire carries the bus invertinformation, which is used at the receiver end to restore the data.

This encoding scheme has some interesting properties. First, theworst-case number of transitions of an n-bit bus is n/2 at each timeframe. Second, if we assume that data are uniformly randomlydistributed, it is possible to show that the average number oftransitions with this code is lower than that of any other encodingscheme with just one redundant line [45].

An unfortunate property of the 1-bit redundant bus-invert code isthat the average number of transitions per line increases as the busgets wider and asymptotically converges to 0.5, which is also theaverage switching per line of an unencoded bus. Moreover, the averagenumber of transitions per line is already close to 0.5 for 32-bitbusses. Thus, this encoding provides small energy saving for busses oftypical width.

A solution to this problem is to partition the bus into fields andto use bus inversion in each field independently. If a word ispartitioned in m fields, then m control lines are needed. Although thisscheme can be much more energy efficient compared with 1-bit businvert, m-bit bus invert is no longer the best among m-redundant codes.Nevertheless, it is conceptually simpler than other encoding schemesbased on redundancy, and thus its implementation overhead (in terms ofpower) is small.

Extensions to the bus invert encoding approach include the use oflimited-weight codes and transition signaling. A k-limited-weight codeis a code having at most k 1's per word. This can be achieved by addingappropriate redundant lines.

Such codes are useful in conjunction with transition signaling,i.e., with schemes in which 1's are transmitted as a 0-1 (or 1-0)transition and 0's by the lack of a transition. Thus, ak-limited-weight code would guarantee at most k transitions per timeframe (if we neglect the transitions on the redundant lines).

As a general note on bus invert and all redundant encoding schemes,although a redundant code can be implemented using additional buslines, there are other options. In particular, it is possible tocommunicate with redundant codes without increasing the number of buslines if the bus transmission rate is increased by (m ? n)/n withrespect to the rate of the unencoded input stream, where n is the dataword width and m is the number of redundant bits.

This approach is often called time redundancy as opposed to spaceredundancy (i.e., adding extra lines). Obviously, many hybrid schemescan be envisioned.

Encoding for Correlated Data
Even though the RWM data model is useful for developing redundant codeswith good worst-case behavior, in many practical cases data words havesignificant correlations in time and space. From an informationtheoretical viewpoint, the data source is not maximum entropy. Thisfact can be profitably exploited by advanced encoding schemes thatoutperform codes developed under the RWN model [46], even withoutadding redundant bus lines.

A typical example of a highly correlated data stream is the addressstream in between a processor and its private memory. Addresses show ahigh degree of sequentiality. This is typical for instruction addresses(within basic blocks) and for data addresses (when data are organizedin arrays).

Therefore, in the limiting case of addressing a stream of data withconsecutive addresses, Gray coding would be beneficial, because the Hamming distance between any pair ofconsecutive words is one, and thus the transitions on the address busare minimized.

By using Gray encoding, instruction addresses need to be convertedto Gray addresses before being sent on the bus. The conversion isnecessary because offset addition and arithmetic address manipulationis best done with standard binary encoding .

Moreover, address increments depend on the word width n. Since mostprocessors are byte-addressable, consecutive words require an incrementby n/8, e.g., by 4 for 32-bit (64-bit) processors. Thus the actualencoding of interest is a partitioned code, whose most significantfield is Gray encoded and whose least significant field has (logn/8)bits.

Musoll et al. [49] proposed a different partitioned code foraddresses, which exploits the locality of reference, namely, mostsoftware programs favor working zones of their address space. Theproposed approach partitions the address into an offset within aworking zone and an identifier of the current working zone. Inaddition, a bit is used to denote a hit or a miss of the working zone.

When there is a miss, the full address is transmitted through thebus. In the case of a hit, the bus is used to transmit the offset(using 1-hot encoding, and transition signaling), and additional linesare used to send the identifier of the working zone (using binaryencoding). As an improvement over working zone, an irredundant codewith similar properties, called sector-based encoding has been proposedby Aghaghiri et al. [50].

The T 0 code [51] uses one redundant line to denote when an addressis consecutive to the previously transmitted one. In this case, thetransmitter does not need to transmit the address and freezes theinformation on the bus, thus avoiding any switching.

The receiver updates the previous address. When the address to besent is not consecutive, then it is transmitted tout court, and theredundant line is de-asserted to inform the receiver to accept theaddress as is.

When one is transmitting a sequence of consecutive addresses, thisencoding requires no transition on the bus, compared with the singletransition (per transmitted word) of the Gray code. In this context, anirredundant code with zero transition activity for address streams wasalso demonstrated, called INC-XOR, and later improved on by Aghaghiriet al. [52].

The highly sequential nature of addresses is just one simple exampleof spatio-temporal correlation on address busses. For instance, DRAMaddress busses use a time multiplexed addressing protocol, whosetransition activity can be reduced by a tailored encoding scheme, asoutlined in Cheng and Pedram [53].

Furthermore, several encoding schemes have been proposed for dealingwith more general correlations than those found in address busses.Unfortunately, many of these approaches assume that data streamstatistics can be collected and analyzed at design time, making themfar less useful for MPSoCs.

Practical Guidelines
The literature on low-power encoding has flourished in the last decade,and choosing the best encoding scheme is today a challenging task. Afew practical guidelines can be of help. First, in estimating the powersavings obtained on the bus, the capacitive load of bus lines should beestimated with great care.

In fact, in sub-micron technologies, coupling capacitance betweenadjacent wires dominates with respect to substrate capacitances; henceone must account for the impact of multiple transitions on adjacentlines. In this area, several coupling-aware low-power encoding schemeshave recently been proposed [57,58].

Second, bus load capacitance estimates should be used as a selectioncriteria for encoding schemes. For typical on-chip bus capacitances,many encoding schemes are not practical, because encoders and decodersare too big and power-hungry.

A good rule of thumb is to convert the expected switched capacitancereduction on the target bus SW red = Cline Wbus ared  (i.e., the product between bus capacitance, number of bus lines, andbus switching activity reduction) into an equivalent number of inverterloads switching N b= SW red /Cin,INV (i.e., how many inverter input switches are needed to switch the sameamount of capacitance that we save with encoding).

If the complexity of encoder and decoder, expressed in terms ofinverter-equivalents, is similar to Neq , then it is verylikely that most of the transition activity savings are swamped byencoder and decoder power consumption.

Also, it is very important to account for the speed and area penaltyof the pure error-detecting circuit (codec). In general, most of theencoding schemes outlined above are well suited for off-chip busseswith very large capacitances (e.g., 10pF or more). Only the simplestschemes are suitable for typical on-chip busses (e.g., 1pF or less).

Low Swing Signaling
An effective approach to high-speed energy-efficient communication islow swing signaling [59]. Even though it requires the design ofreceivers with good adaptation to line impedance and high sensitivity(often achieved by means of simplified sense-amplifiers), power savingson the order of 10X have been estimated with reduced interconnectswings of a few hundreds of mV in a 0.18-micrometer process [60].

The use of low swing signaling poses a critical challenge fordesign: communication reliability has to be provided in spite of thedecreased noise margins and the strong technology limitations, underlimited power budgets.

With present technologies, most chips are designed under theassumption that electrical waveforms can always carry correctinformation on chip. As technology scales, communication is likely tobecome inherently unreliable because of the increased sensitivity ofinterconnects to on-chip noise sources, such as crosstalk andpower-supply noise.

As a consequence, solutions for combined energy minimization andcommunication reliability control have to be developed for NoCs.Redundant bus encoding provides a degree of freedom for spanning theenergy-reliability tradeoff.

The key point of this approach is to model on-chip interconnects asnoisy channels and to exploit the error detection capability codingschemes, which would provide a link transfer reliability in excess withrespect to the constraint, to decrease the voltage swing, resulting inan overall energy saving (compared with the unencoded link) in spite ofthe overhead associated with the code implementation.

The energy efficiency of a code is tightly related to its errorrecovery technique, namely, error correction or retransmission ofcorrupted data. This issue resembles the tradeoff investigation betweenforward error correction (FEC)and automatic repeat request (ARQ),well known to network engineers, but for on-chip communication networksthis study is still in its early stage.

Bertozzi et al. [61] have explored the energy-reliability tradeofffor on-chip low-swing busses. Starting from a standard on-chip bus (AMBA),various reliability-enhancement encoding techniques are explored. Hamming codes are explored inboth their error correction and error detection embodiments.

Other linear codes, namely, CyclicRedundancy Check (CRC) codes, are considered as well. Forerror-detecting codes, an AMBA-compliant retransmission mechanism isalso proposed. Experimental results in Bertozzi et al. [61] point outthat the detection capability of a code plays a major role indetermining its energy efficiency, because the higher the errordetection capability the lower we can drive the voltage swing, knowingthat the increased error rate caused by the lowered signal-to-noiseratio would not lead to system failures as long as the errors aredetected.

As far as error recovery is concerned, error correction isbeneficial in terms of recovery delay but has two main drawbacks: itlimits the detection capability of a code, and it makes use ofhigh-complexity decoders. On the contrary, when the higher recoverydelay (associated with the retransmission time) of retransmissionmechanisms can be tolerated, they provide a higher energy efficiency,thanks to the lower swings and simpler codecs they can use whilepreserving communication reliability.

In many practical cases, communication noise depends on operatingconditions and environmental factors. It is then possible to deviseadaptive schemes that dynamically adjust signal levels (i.e., thevoltage swing on bus lines) depending on noise levels. In the approachproposed by Worm et al. [62], the frequency of detected errors by anerror-detecting code is monitored at run time.

When the frequency rises too much, the voltage swing is increased,whereas it is decreased if error frequency becomes very small. Thisclosed-loop control of voltage provides much increased robustness andflexibility, even though the complexity of the encoding and decodingcircuitry is increased, and variable voltage swing circuits are alsorequired.

Next in Part 4: Advancedinterconnects and energy aware software
To read Part 1, go to “Energy-awareprocessor design.”
To read Part 2, go to “Energy-awarememory design.”

This series of articles is based oncopyrighted material submitted by Mary Jane Irwin, Luca Beni, N.Vijaykrishnan and Mahmut Kandemir to “MultiprocessorSystems-On-Chips,” edited by Wayne Wolf and Ahmed Amine Jerraya. Itis used with the permission of the publisher, Morgan Kaufmann, animprint of Elsevier. The book can be purchased on-line.

Mary Jane Irwin is the A.Robert Noll Chair in Engineering in the Department of Computer Scienceand Engineering at Pennsylvania State University. Luca Benini is professor at theDepartment of Electrical Engineering and Computer Science at theUniversity of Bologna in Italy. N.Vijaykrishnan is an associate professor, and Mahmut Kandemir is an assistantprofessor in the Computer Science and Engineering Department atPennsylvania State University.

Ahmed Jerraya is researchdirector with CNRS and is currently managing research on multiprocessorsystem-on-chips at TIMA Laboratory in France. Wayne Wolf is currently the GeorgiaResearch Alliance Eminent Scholar holding the Rhesa “Ray” S. Farmer,Jr. Distinguished Chair in Embedded Computer Systems at Georgia Tech'sSchool of Electrical and Computer Engineering (ECE). Previously aprofessor of electrical engineering at Princeton University, he workedat AT&T Bell Laboratories.

References:
[42] Ho, R., et.al., “The future of wires,” Proceedings of the IEEE.,2001.
[43] Osborne, S., et. al., :Bus encoding architecture for low powerimplementation of an AMBA-based platform,” IEEE Proceedings: Computersand Digital Techniques, 2002.RM
[44] Ramprasad, S., et.al., “Signal coding for low power: fundamentallimits and practical limitations.” International Symposium on Circuitsand Systems, 1998.
[45] Stan, M., et. al., “Bus-invert coding for low power I/O.” 1995[46] Stan, M., et.al. “Low power encodings for global communication inCMOS VLSI.” IEEE Transactions on VLSI Systems.1997.
[49] Musoll, E., et.al., “Working zone encoding for reducing the energyin microprocessor address busses,” IEEE Transactions on VLSI Systems.1998
[50] Aghaghiri, Y., et.al., ” ALBORZ: address level bus poweroptimization.” International Symposium on Quality Electronic Design.2002
[51] Benini, L., et.al., “Asympotic sero transition acivity encodingfor address buses in low power microprocessor based systems.” GreatLakes Symposium on VLSI, 1997.
[52] Ahaghiri, Y., et.al., “Reducing transitions on memory buses usingsector based enconding techniques” International Sympoium on Low PowerElectronics and Design. 2002.
[53] Cheng, W-C, et. al., “Power optional encoding for a DRAM addressbus,” IEEE Transactions on VLSI Systems. 2002.
[57] Kim, K-W, et.al., “Coupling driven signal encoding scheme for lowpower interface design.” International Conference on Computer AidedDesign. 2000.
[58] Sotirladis, P., et.al. “Low power bus coding techniquesconsidering interwire capacitances.” Custom Integrated CircuitsConference. 2000.
[60] Svensson, C., “Optimum voltage swing on on chip and off chipinterconnect.” IEEE Journal of Solid State Circuits. 2001
[61] Bertozzi, D., et.al., “Low power error resilient encoding for onchip data buses.” Design Automation and Test in Europe Conference. 2002.
[62] Worm, F., et.al., “An adaptive low power transmission scheme foron chip networks,” Intrernational Symposium on System Synthesis. 2002.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.