High-performance embedded computing - Power and energy consumption - Embedded.com

High-performance embedded computing — Power and energy consumption

Editor's Note: Interest in embedded systems for the Internet of Things often focuses on physical size and power consumption. Yet, the need for tiny systems by no means precludes expectations for greater functionality and higher performance. At the same time, developers need to respond to growing interest in more powerful edge systems able to mind large stables of connected systems while running resource-intensive algorithms for sensor fusion, feedback control, and even machine learning. In this environment and embedded design in general, it's important for developers to understand the nature of embedded systems architectures and methods for extracting their full performance potential. In their book, Embedded Computing for High Performance, the authors offer a detailed look at the hardware and software used to meet growing performance requirements.

Elsevier is offering this and other engineering books at a 30% discount. To use this discount, click here and use code ENGIN318 during checkout.

Adapted from Embedded Computing for High Performance, by João Cardoso, José Gabriel Coutinho, Pedro Diniz.

By João Cardoso, José Gabriel Coutinho, and Pedro Diniz

As power dissipation and energy consumption are critical concerns in most embedded systems, it is important to be aware of techniques that impact and most importantly that reduce them. Dynamic voltage and frequency scaling (DVFS) [30], dynamic frequency scaling (DFS), dynamic voltage scaling (DVS), and dynamic power management (DPM) are techniques related to architectures and hardware to reduce energy/power consumption.

Power consumption is represented in watts (W), which directly affects system heat (temperature) and the possible need for cooling schemes. The total power consumption [Note: Although the term “power dissipation is more appropriate,” the term “power consumption” is widely used.] of a CMOS integrated circuit (IC) is the sum of the static power and the dynamic power as represented by Eq. (2.6).

Short circuits and leakage currents are responsible for power consumption even when transistor devices are not switching. The static power consumption (Pstatic ) can be calculated by Eq. (2.7), where Vcc represents the supply voltage (sometimes also represented as Vdd ) and Icc (sometimes represented as Isc ) represents the overall current flowing through the device which is given by the sum of the leakage currents. The static power depends mainly on the area of the IC and can be decreased by disconnecting some parts of the circuit from the supply voltage and/or by reducing the supply voltage.

The dynamic power (Pdynamic ) consumption can be calculated by Eq. (2.8), where Vcc represents the supply voltage, β represents the activity factor, CL represents the load capacitance, and f denotes the clock frequency at which the device is operating. Pdynamic is proportional to the switching activity of the transistors in the IC. Thus, one way to reduce the dynamic power is to make regions of the IC nonactive and/ or to reduce Vcc and/or f .

When reducing the dynamic power by reducing the frequency and/or the supply voltage, it is common to attempt to reduce the value of the supply voltage as its value impacts in a quadratic way the dynamic power Pdynamic V2 cc . Reducing the clock frequency clearly has a negative impact on execution time as the components will operate at a lower clock frequency thus translating into longer execution times, increased latencies, or lower throughputs. Moreover, reducing the supply voltage may also imply a reduction of the clock frequency. Typically, the system provides a table of discrete values of supply voltages that they can operate under along with the corresponding maximum clock frequencies (known as frequency-voltage table). Thus, the supply voltage (Vcc ) can be seen as a linear function of the clock frequency as depicted in Eq. (2.9) and the dynamic power consumption is directly proportional to the cube of the clock frequency (Pdynamic f3 ), as shown in Eq. (2.10).

The tuple consisting of the voltage and the corresponding maximum clock frequency is known as the operating performance point (OPP), OPP = (f ,V).


The operating performance points (OPPs) depend on the components (e.g., CPUs) and the support included in the system implemented in the IC.

For example, the Texas Instruments OMAP-L138a IC includes an ARM9 RISC processor and a VLIW DSP. The following table presents the OOPs for three subsystems of the IC, the ARM processor, the DSP, and the RAM.

Intel also has provided control over the processor’s the Enhanced Intel SpeedStep Technology.b

a Texas Instruments Inc., OMAP-L138 C6000 DSP+ARM Processor, SPRS586I—June 2009—Revised September 2014.

b Intel Corp. Enhanced Intel SpeedStep Technology for the Intel Pentium M Processor, White Paper, March 2004.

Table 2.1 presents an example of four OOPs of a hypothetical processor. This example considers the processor can operate with a supply voltage of 1 V at a maximum clock frequency of 300 MHz, or 1.2 V at 600 MHz, or 1.3 V at 800 MHz, or at 1.4 V at 1 GHz. These OOPs can be managed by the operating system (OS) and/or by the application via system library functions.

To reduce dynamic power and static power consumption, two main mechanisms can be used, namely, dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM), respectively. These two techniques are described in the following subsections.

Table 2.1 Example of frequency-voltage table (representing OPPs)

The energy consumed, represented in Joules (J), during a period of activity is the integral of the total power dissipated (Eq. 2.6 for CMOS ICs) over that period. Eq. 2.11 represents in a simplified way the total of energy consumed during a period T (in seconds) as the product of the average power dissipated over that period (Pavg) by T. For a given application, one can save energy if power consumption/dissipation is reduced and the execution time is not increased as much, or conversely if the execution time is reduced without a large increase in power dissipation.


ACPIa is a standard interface specification supported by hardware and software drivers. Usually the operating system uses this API to manage power consumption, but the API can also be used by applications.

ACPI considers four global states (from G0 to G3, with a subdivision in six sleep states S0 to S5), performance states (base on the possible OPPs), known as P-states (and from P0 to P15 at maximum, being P0 the state operating at maximum clock frequency and resulting in the highest power dissipation), Processor states, known as C-states (and usually from C0 to C3, but depending on the processor). The C-states represent operating (C0), Halt (C1), Stop-Clock (C2), and Sleep (C3) states.

a Unified EFI Inc. Advanced configuration and power interface specification ACPI overview. Revision 5.1 [July, 2014] http://www.uefi.org/sites/default/files/resources/ACPI_5_1release.pdf.

Energy consumption is not directly associated with heat but affects battery usage. Thus by saving energy one is also extending the battery life or the length of the interval between battery recharges. On the other hand, energy efficiency is a term related to performing the work needed with as less energy as possible, sometimes being quantified by the work done per Joule (e.g., samples/J, Gbit/J, and frames/J). Two common metrics used to represent the trade-off between energy consumption and performance are the energy delay product (EDP) and the energy delay squared product (ED2 P). EDP is obtained by multiplying the average energy consumed by the computation time required [31]. Both EDP and ED2 P give more importance to the execution time than to the energy consumed. Compared to EDP, by squaring the computation time required, the ED2 P metric gives even more importance to execution time than to energy consumption. One can give even more importance to performance by powering the computation time required by values >2.

The next installment in this series discusses dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM).   

Reprinted with permission from Elsevier/Morgan Kaufmann, Copyright © 2017

João Manuel Paiva Cardoso , Associate Professor, Department of Informatics Engineering (DEI), Faculty of Engineering, University of Porto, Portugal. Previously I was Assistant Professor in the Department of Computer Science and Engineering, Instituto Superior Técnico (IST), Technical University of Lisbon (UTL), in Lisbon (April 4, 2006- Sept. 3, 2008), and Assistant Professor (2001-2006) in the Department of Electronics and Informatics Engineering (DEEI), Faculty of Sciences and Technology, at the University of Algarve, and Teaching Assistant in the same university (1993-2001). I have been a senior researcher at INESC-ID (Systems and Computer Engineering Institute) in Lisbon. I was member of INESC-ID from 1994 to 2009.

José Gabriel de Figueiredo Coutinho , Research Associate, Imperial College. He is involved in the EU FP7 HARNESS project to intergrate heterogeneous hardware and network technologies into data centre platforms, to vastly increase performance, reduce energy consumption, and lower cost profiles for important and high-value cloud applications such as real-time business analytics and the geosciences. His research interests include database functionality on heterogeneous systems, cloud computing resource management, and performance-driven mapping strategies.

Pedro C. Diniz received his M.Sc. in Electrical and Computer Engineering from the Technical University in Lisbon, Portugal and his Ph.D. from the University of California, Santa Barbara in Computer Science in 1997. Since 1997 he has been a researcher with the University of Southern California’s Information Sciences Institute (USC/ISI) and an Assistant Professor of Computer Science at the University of Southern California in Los Angeles, California. He has lead and participated in many research projects funded by the U.S. government and the European Union (UE) and has authored or co-authored many internationally recognized scientific journal papers and over 100 international conference papers. Over the years he has been heavily involved in the scientific community in the area of high-performance computing, reconfigurable and field-programmable computing.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.