Techniques for Designing Energy-Aware MPSoCs - Part 2In an MPSoC system, memories constitute a significant portion of the overall chip resources, as there are various memory structures ranging from private memories for individual processors to large shared cache structures. Energy is expended in these memories due to data accesses (reads/writes), coherence activity required to maintain consistency between shared data, and leakage energy expended in just storing the data.
Reducing Active Energy
Many techniques have been proposed in the past to reduce cache energy consumption. Among these are partitioning large caches into smaller structures to reduce the dynamic energy and the use of a memory hierarchy that attempts to capture most accesses in the smallest size memory.
By accessing the tag and data array in series, Alpha 21164's L2 cache can access the selected cache bank for energy efficiency. In Inoue et al.  and Powell et al. , cache way-prediction is used to reduce energy consumption of set-associative caches. Selective way caches  varies the number of ways for different application requirements.
In Kin et al. , a small filter cache is placed prior to the L1 cache to reduce energy consumption. Dynamic zero compression employs single-bit access for zero-valued byte in the cache to reduce energy consumption.
Many of these techniques applied in the context of single processors are also applicable for the design of caches associated with the individual processors of the MPSoC system. In addition to the hardware design techniques, software optimizations that reduce the number of memory accesses through code and data transformations can be very useful in reducing energy consumption.
Reducing Standby Energy
However, most of the above techniques do little to alleviate the leakage energy problem as the memory cells in all partitions and all levels of the hierarchy continue to consume leakage power as long as the power supply is maintained to them, irrespective of whether they are used or not.
Various circuit technologies have been designed specifically to reduce leakage power when the component is not in use. Some of these techniques focus on reducing leakage during idle cycles of the component by turning off the supply voltage.
One such scheme, gated-Vdd, was integrated into the architecture of caches [24} to shut down portions of the cache dynamically. This technique was applied at a cache block granularity in Kaxiras et al.  and used in conjunction with software to remove dead objects in Chen et al. .
However, all these techniques assume that the state (contents) of the supply-gated cache memory is lost. Although totally eliminating the supply voltage results in the state of the cache memory being lost, it is possible to apply a state-preserving leakage optimization technique if a small supply voltage is maintained to the memory cell.
Many alternate implementations have recently been proposed at the circuit level to achieve such a state-preserving leakage control mechanism. As an abstraction of these techniques, the choice between the state-preserving and state-destroying techniques depends on the relative overhead of the additional leakage required to maintain the state as opposed to the cost of restoring the lost state from other levels of the memory hierarchy.
An important requirement to reduce leakage energy using either a state-preserving or a state-destroying leakage control mechanism is the ability to identify unused resources (or data contained in them). In Yang et al. , the cache size is reduced (or increased) dynamically to optimize the utility of the cache. In Kaxiras et al. , the cache block is supply-gated if it has not been accessed for a period of time.
In Zhou et al. , hardware tracks the hypothetical miss rate and the real miss rate by keeping the tag line active when deactivating a cache line. Then the turn-off interval can be dynamically adjusted based on such information. In Flautner et al.  and Kim et al. , dynamic supply voltage scaling is used to reduce the leakage in the unused portions of the memory.
In contrast to the other schemes, their drowsy cache scheme also preserves data when a cache line is in low leakage mode. The usefulness and practicality of such state-preserving voltage scaling schemes for embedded power-optimized memories is demonstrated in Qin and Rabaey . The focus in Heo et al.  is on reducing bitline leakage power using leakage-biased bitlines.
The technique turns off precharging transistors of unused subbanks to reduce bitline leakage, and actual bitline precharging is delayed until the subbank is accessed. All these techniques for cache power optimization can also be applied to reduce the leakage energy in the MPSoC.