By Mary Jane Irwin, Luca Benini, N. Vijaykrishnan, and Mahmut Kandemir
In an
MPSoC system, memories
constitute a significant portion of the
overall chip resources, as there are various memory structures ranging
from private memories for individual processors to large shared cache
structures. Energy is expended in these memories due to data accesses
(reads/writes), coherence activity required to maintain consistency
between shared data, and leakage energy expended in just storing the
data.
Reducing Active Energy
Many techniques have been proposed in the past to reduce cache energy
consumption. Among these are partitioning large caches into smaller
structures to reduce the dynamic energy and the use of a memory
hierarchy that attempts to capture most accesses in the smallest size
memory.
By accessing the tag and data array in series, Alpha 21164's L2
cache can access the selected cache bank for energy efficiency. In
Inoue et al. [18] and Powell et al. [19], cache way-prediction is used
to reduce energy consumption of set-associative caches. Selective way
caches [20] varies the number of ways for different application
requirements.
In Kin et al. [21], a small filter cache is placed prior to the L1
cache to reduce energy consumption. Dynamic zero compression employs
single-bit access for zero-valued byte in the cache to reduce energy
consumption.
Many of these techniques applied in the context of single processors
are also applicable for the design of caches associated with the
individual processors of the MPSoC system. In addition to the hardware
design techniques, software optimizations that reduce the number of
memory accesses through code and data transformations can be very
useful in reducing energy consumption.
Reducing Standby Energy
However, most of the above techniques do little to alleviate the
leakage energy problem as the memory cells in all partitions and all
levels of the hierarchy continue to consume leakage power as long as
the power supply is maintained to them, irrespective of whether they
are used or not.
Various circuit technologies have been designed specifically to
reduce leakage power when the component is not in use. Some of these
techniques focus on reducing leakage during idle cycles of the
component by turning off the supply voltage.
One such scheme, gated-Vdd, was integrated into the architecture of
caches [24} to shut down portions of the cache dynamically. This
technique was applied at a cache block granularity in Kaxiras et al.
[25] and used in conjunction with software to remove dead objects in
Chen et al. [26].
However, all these techniques assume that the state (contents) of
the supply-gated cache memory is lost. Although totally eliminating the
supply voltage results in the state of the cache memory being lost, it
is possible to apply a state-preserving leakage optimization technique
if a small supply voltage is maintained to the memory cell.
Many alternate implementations have recently been proposed at the
circuit level to achieve such a state-preserving leakage control
mechanism. As an abstraction of these techniques, the choice between
the state-preserving and state-destroying techniques depends on the
relative overhead of the additional leakage required to maintain the
state as opposed to the cost of restoring the lost state from other
levels of the memory hierarchy.
An important requirement to reduce leakage energy using either a
state-preserving or a state-destroying leakage control mechanism is the
ability to identify unused resources (or data contained in them). In
Yang et al. [24], the cache size is reduced (or increased) dynamically
to optimize the utility of the cache. In Kaxiras et al. [25], the cache
block is supply-gated if it has not been accessed for a period of time.
In Zhou et al. [30], hardware tracks the hypothetical miss rate and
the real miss rate by keeping the tag line active when deactivating a
cache line. Then the turn-off interval can be dynamically adjusted
based on such information. In Flautner et al. [31] and Kim et al. [32],
dynamic supply voltage scaling is used to reduce the leakage in the
unused portions of the memory.
In contrast to the other schemes, their drowsy cache scheme also
preserves data when a cache line is in low leakage mode. The usefulness
and practicality of such state-preserving voltage scaling schemes for
embedded power-optimized memories is demonstrated in Qin and Rabaey
[33]. The focus in Heo et al. [34] is on reducing bitline leakage power
using leakage-biased bitlines.
The technique turns off precharging transistors of unused subbanks
to reduce bitline leakage, and actual bitline precharging is delayed
until the subbank is accessed. All these techniques for cache power
optimization can also be applied to reduce the leakage energy in the
MPSoC.