Minimize leakage power in embedded SoC designs with Multi-Vt cells - Embedded.com

Minimize leakage power in embedded SoC designs with Multi-Vt cells

The authors describe the use of a multithreshold voltage (Multi-Vt) flow technique that does not require embedded SoC architecture changes and allows a designer to decide when to use Low-Vt cells, which have better timing but higher leakage power, and when to use High-Vt cells which have lower leakage but worse timing.

Minimizing leakage power in systems-on-chip (SoCs) has become a major priority for designers because it increases drastically in submicron process technologies, becoming a major proportion of power usage. There are various design techniques to optimize dynamic power, such as power gating and dynamic voltage and frequency scaling (DVFS), but these require architectural changes that add to chip complexity, which you want to avoid in SoCs. Multiple voltage threshold (Multi-Vt) flow is the only technique that doesn’t require changes to the SoC architecture; it depends instead on how judiciously the designer uses Low-Vt cells. Low-Vt cells have better timing but higher leakage power; High-Vt cells have lower leakage but worse timing.

To minimize leakage power, Multi-Vt cells are used during the logical synthesis stage of the design (Figure 1 below). Since High-Vt cells have more delays, these cells are used where timing is relaxed, whereas Std-Vt and Low-Vt cells are used at timing-critical places. The expectation is always to meet timing with optimal area and power. The important point here is that priority is still given to timing as logic synthesis is done at the worst process voltage temperature (PVT), i.e. WCS-HOT (worst case timing at maximum temperature), where delay of the cells is maximum.  (In Figure 1 RTL refers to register transfer level). 

Figure 1 Traditional Synthesis Flow

As we move onto lower technologies, i.e. from 90nm to 65nm to 45nm technology, timing delays have decreased and hence the chip operating voltage is reduced to save power. This results in new effects such as temperature inversion, which leads to higher threshold voltage with decreasing temperature.

Thus the cells show higher delays at lower-temperature corner rather than at the higher temperature. Since the timing corner for setup optimization is the one where delay of the cells is maximum, in this case the worst corner for setup timing optimization should be WCS-COLD instead of WCS-HOT. So optimizing design at WCS-HOT would not actually be timing clean at WCS-COLD (worst case timing scenario at minimum temperature). The PVT condition for different corners can be referenced from Table 1 below.


Table 1
PVT conditions for different corners

We share the results of one block (Cortex A5 Core) in 40nm technology in several case studies. Details of the design are shown in Table 2 below:

Table 2 Cortex A5 Core design

Case 1: Synthesis of the design done with WCS-HOT libraries (traditional corner) and output netlist was timing clean in WCS-HOT. Loading the same netlist with WCS-COLD libraries showed significant timing violations, shown in Figure 2(a) below.

Figure 2(a) Synthesis done at WCS-HOT corner

Case 2: Synthesis of the design done with WCS-COLD libraries (because of temperature inversion effect) and output netlist was timing clean in WCS-COLD. Loading the same netlist with WCS-HOT libraries, there were no timing violations, as shown in Figure 2(b) below.

Figure2 (b) Synthesis done at WCS-COLD corner


Based upon the above discussed cases, it can be concluded that WCS-COLDis the worst corner for setup timing optimization. But the selection ofWCS-COLD corner for timing optimization imposes a new challenge forleakage power optimization. For leakage power, the worst corner isopposite to worst corner for setup timing, as shown in Figure 3 below.

Figure 3 PVT conditions for worst timing and power

Since power is directly proportional to voltage it is imperative toconsider Vmax (maximum voltage) for worst mode power calculation.Generally leakage power is very low at Tcold but is very high at Thot.This is further explained in Table 3 below that shows the leakage powercharacteristics of a buffer for a High-Vt (HVT) and a Std-Vt (SVT) cell.

Table 3 HVT / SVT Cell characteristics

As can be seen from the Table 3 , the leakage power of SVT bufferis comparable to HVT buffer in WCS-COLD condition where as SVT cellleakage power is 5 times HVT cell. During synthesis, if setup timingoptimization is done at WCS-COLD, the tool will not be able todifferentiate much between SVT and HVT cells when optimizing the designfor leakage power. This can result in higher usage of SVT cells becauseof their high performance (timing) characteristics.

As shown in the above table, we cannot consider the power reported atthe WCS-COLD corner because the difference is huge when compared toBCS-HOT. So for power calculations, the same design will show very highleakage power when measured at BCS-HOT conditions. This understandingwas validated by the following results (on Cortex A5 Core) whensynthesis was done using WCS-COLD libraries.

Table 4 Results from synthesis with WCS-COLD libraries

As can be seen from Table 4 above, the SVT usage is very high, which impacts the leakage power in BCS-HOT condition.

Solution
To optimize the leakage power and timing simultaneously we use both thelibraries simultaneously (WCS-COLD for timing optimization and BCS-HOTfor leakage power optimization) during synthesis as shown in Figure 4 below.

Figure 4 New Synthesis optimization flow

To achieve this, we enable the following steps during Synthesis:

  1. Create two library domains.
  2. With one library domain attach all the Timing (WCS-COLD) libraries and with other attach all the Power (BCS-HOT) libraries
  3. Set the Timing library domain as the default library domain.
  4. Link the Power library domain as the power library to the default library domain.

The default library domain has two set of libraries where it will pickthe timing library for timing estimation and the power library for powercalculation. Since the EDA tool has the correct information for bothpower and timing, it will optimize the timing efficiently by picking thecells having the least power.

Table 5 below compares the synthesis results of traditional (old) approach with the new approach.

Table 5 Comparison of the two approaches

The above results show that even though the area increases slightly (3%area increase), the leakage power is decreased by approximately 50% withsignificant reduction in the Std-Vt count. Considering the criticalimportance of power conservation in today's SOCs, this approach will bean extremely valuable one.

Conclusion
As we move to lower technologies, the traditional flows may not remainvalid and will need modifications due to different behavior at lowernodes. Due to the temperature inversion effect, we have to move totheWCS COLD corner for setup timing optimization. As the resultsdemonstrate, the traditional flow led to very high leakage power. Withthe use of BCS HOT libraries along with WCS COLD, we get the best resultin terms of both power and timing.

Rajiv Mittal is working with Freescale Semiconductor as Staff Design Engineer and has experience of more than 11 years. He has worked in different domains of ASIC design, mainly in physical design activities, that includes experience on different architectures across wireless and automotive designs with technology ranging from 130nm to 40nm.

Abhishek Mahajan (b13294@freescale.com) is a Senior Design Engineer at Freescale Semiconductors, Noida, India. He has four years ofexperience in various domains such as logical and physical Synthesis,Static Timing Analysis, Place and Route and static low powerverification.

Sorabh Sachdeva ()is a Senior Design Engineer at Freescale Semiconductors, Noida, India.He has 6 years of experience in logical and physical synthesis, static low power verification, and formal verification, and has also worked with EDAindustry.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.