Reduce SoC power consumption without high-level circuit design tools -

Reduce SoC power consumption without high-level circuit design tools

Design engineers today face an arduous task of limiting power consumption in their SoC designs. More often than not, the complexities of designing for low power consumption are far removed from the EDA tools used in the backend design cycle. This gap can manifest in terms of interpretation of the tools, algorithms that are missing corner cases, or lack of support for the implementation of a particular design. 

Thus, over-reliance on EDA tools might not always be a practical solution to low power concerns. It is always prudent to start the planning for power during the earliest stages of register transfer language (RTL) design. In this article we discuss several situations where the use of high level EDA tools are not useful and are sometimes a hindrance. We offer some techniques that can be used early at the RTL coding stage that will consume less power but keep the basic design intent unchanged.

Gating the clock 
Clock Gating is the most commonly used design technique to save dynamic power consumed within the SoC. The entire SoC is seldom functional at any particular instant. Therefore, one can identify the possibilities where clock can be gated and make use of clock gating integrated cells. Consider the following RTL construct:

always @ (posedge clk or negedge reset)
  if (!reset)
    block_ff <= 16’b0;
  else if (block_enable) 
block_ff <= storage_next;

Figure 1 shows the logical implementation and the corresponding low power clock-gated implementation that could be employed to save power.

Modern EDA tools identify such constructs and convert them into the clock-gated implementation shown in Figure 1. However, it is not always desirable to get a gated clock structure from synthesis. It is possible that the enable condition (block_enable) may be “ON” so that implementing such a structure would augment the power consumption because the clock gating cell would itself consume some dynamic power. Even worse, the EDA tool might fail to discern such intent and not implement the clock gating structure where it could have indeed saved power. 

EDA tools often see only what you tell them to see, so it is a better design practice to implement and insert clock gating cells in the RTL Itself, taking into account the actual use cases of when the particular block in question would be “ON” or “OFF”.

Signal Encoding
Designing finite state machines (FSMs) requires encoding the individual states, which can be a simple binary encoding, one-hot encoding, and gray encoding. Of these, Gray encoding leads to a design which consumes lower power as compared to binary encoding, as discussed below.

As shown in Figure 2 , in Gray Encoding any two adjacent states differ in only 1 bit. Hence, during the normal operation of the FSM, there are fewer transitions as the FSM moves from one state to another, resulting in lower power consumption. While designing a higher order Gray Counter might be a tedious task, for lower order counter (up to 4 bits and 16 states), this encoding scheme can be employed to save power without incurring any additional design complexity.

Fixing redundant transitions by operand isolation 
One can save considerable dynamic power by fixing the redundant transitions in the design. However, the key lies in identifying those structures and designing with the goal of low power in mind. Consider the simple design of an Arithmetic Logic Unit which includes an adder and a multiplier, shown in Figure 3

Consider a simple ALU, which includes an adder and a multiplier circuit. If SEL = 1’b1 , we get (A + B) at the output of multiplexer. But if SEL = 1’b0 , we get (A X B) at the output of multiplexer. Note that at any instant in the above circuit, we would be choosing either the adder function or the multiplier function. 

Hence, one of these two operations would always be redundant. A trivial rearrangement can lead to significant dynamic power savings if we reconfigure the SEL signal to pre-compute the adder function at the multiplier stage. This reduces the dynamic power that would have otherwise dissipated in the other function. 

The entire operation of the ALU is pertinent only when Load_Enable is asserted. Hence, it makes sense to combine Operand_Enable with Load_Enable , as shown in Figure 4 . This effectively isolates the operands depending upon whether they are needed downstream or not. This technique is also referred to as operand isolation.

Apart from fixing redundant transitions, power canalso be saved by retiming the circuit, which means isolating the highcapacitance nets from logic circuit elements that toggle frequently. Thehigher the capacitance and toggling rate, the more frequent thecharging/discharging of the output load that consititutes much of thedynamic power generated. Unfortunately this is easier said than done. Amajor challenge is to identify such cases and employ retiming only whenthe circuit architecture gives the designer the liberty to do so.

Exploiting RTL tricks
Considerthe two simple examples below that can lead to a simpler circuit thatreduces not only the power generated but also the design effort and chiparea: 

  • Multiply by 7  – Consider an operation of multiplying an operand by 7. One way to do this is to simply create this operation in logic, but this would undoubtedly be bulky and power hungry. On the other hand, to multiply by 7, one can simply right-shift the operand by 3 and then subtracting the operand once. 
Multiply by 7 = (Operand << 3) – Operand
  • BCD Multiplication by 5  – There are two possible methods to multiply BCD digits (from 0 to 9) by 5. One way would be to solve and minimize the logic depicted by the truth table below:

Table 1: BCD multiplication truth table

But by looking closely at this table you will see that for any set of input values ABCD  the output can be easily represented by: {0 A B C} {0 D 0 D} , at substantial reduction in power consumption and circuit complexity.

Aside from the specific techniques discussed in this article, the following generalizations can be made:

  • Using explicit clock gates in the RTL is a better design technique than expecting the backend synthesis tools to do the same.
  • More than one signal encoding scheme must be analyzed to gauge design complexity and power consumption.
  • Fixing redundant transitions can significantly reduce the dynamic power. However, a key challenge lies in identifying those possibilities. 
  • Isolating the high capacitance nets from frequently toggling logic can save power.
  • Designers should be careful of using RTL tricks, especially involving multiplication as shown above.

Naman Gupta 
hasbeen working with Freescale Semiconductor for 2 years. He is part ofthe Physical Design team and has successfully driven constraintsdevelopment and timing closure of various SoCs from 65nm to 45nm. Hisother areas of interest include high performance and low power design.He can be reached at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.