Memory effects. In many applications, the biggest payoff in energy reduction for a given amount of designer effort comes from concentrating on the memory system. As Figure 5-26 below shows [Cat98], memory transfers are by far the most expensive type of operation performed by a CPU - a memory transfer takes 33 times more energy than does an addition.
As a result, the biggest payoffs in energy optimization come from properly organizing instructions and data in memory. Accesses to registers are the most energy efficient; cache accesses are more energy efficient than main memory accesses.
 |
| Figure 5-26. Relative energy consumption of various operations [Cat98] |
Caches are an important factor in energy consumption. On the one hand, a cache hit saves a costly main memory access, and on the other, the cache itself is relatively power hungry because it is built from SRAM, not DRAM. If we can control the size of the cache, we want to choose the smallest cache that provides us with the necessary performance.
Li and Henkel [Li98] measured the influence of caches on energy consumption in detail. Figure 5-27 below breaks down the energy consumption of a computer running MPEG (a video encoder) into several components: software running on the CPU, main memory, data cache, and instruction cache.
As the instruction cache size increases, the energy cost of the software on the CPU declines, but the instruction cache comes to dominate the energy consumption. Experiments like this on several benchmarks show that many programs have sweet spots in energy consumption.
If the cache is too small, the program runs slowly and the system consumes a lot of power due to the high cost of main memory accesses. If the cache is too large, the power consumption is high without a corresponding payoff in performance.
At intermediate values, the execution time and power consumption are both good. How can we optimize a program for low power consumption? The best overall advice is that high performance = low power. Generally speaking, making the program run faster also reduces energy consumption.

 |
| Figure 5-27. Energy and execution time versus instruction data cache size for a benchmark program[Liu98] |
Clearly, the biggest factor that can be reasonably well controlled by the programmer is the memory access patterns. If the program can be modified to reduce instruction or data cache conflicts, for example, the energy required by the memory system can be significantly reduced.
The effectiveness of changes such as reordering instructions or selecting different instructions depends on the processor involved, but they are generally less effective than cache optimizations. A few optimizations mentioned previously for performance are also often useful for improving energy consumption:
1) Try to use registers efficiently. Group accesses to a value together so that the value can be brought into a register and kept there.
2) Analyze cache behavior to find major cache conflicts. Restructure the code to eliminate as many of these as you can:
— For instruction conflicts, if the offending code segment is small, try to rewrite the segment to make it as small as possible so that it better fits into the cache. Writing in assembly language may be necessary. For conflicts across larger spans of code, try moving the instructions or padding with NOPs.
— For scalar data conflicts, move the data values to different locations to reduce conflicts.
— For array data conflicts, consider either moving the arrays or changing your array access patterns to reduce conflicts.
3) Make use of page mode accesses in the memory system whenever possible. Page mode reads and writes eliminate one step in the memory access, saving a considerable amount of power.
Metha et al. [Met97] present some additional observations about energy optimization as follows:
1) Moderate loop unrolling eliminates some loop control overhead. However, when the loop is unrolled too much, power increases due to the lower hit rates of straight-line code.
2) Software pipelining reduces pipeline stalls, thereby reducing the average energy per instruction.
3) Eliminating recursive procedure calls where possible saves power by getting rid of function call overhead. Tail recursion can often be eliminated; some compilers do this automatically.