High-performance embedded computing - Dynamic voltage and frequency scaling - Embedded.com

High-performance embedded computing — Dynamic voltage and frequency scaling

Editor's Note: Interest in embedded systems for the Internet of Things often focuses on physical size and power consumption. Yet, the need for tiny systems by no means precludes expectations for greater functionality and higher performance. At the same time, developers need to respond to growing interest in more powerful edge systems able to mind large stables of connected systems while running resource-intensive algorithms for sensor fusion, feedback control, and even machine learning. In this environment and embedded design in general, it's important for developers to understand the nature of embedded systems architectures and methods for extracting their full performance potential. In their book, Embedded Computing for High Performance, the authors offer a detailed look at the hardware and software used to meet growing performance requirements.

Elsevier is offering this and other engineering books at a 30% discount. To use this discount, click here and use code ENGIN318 during checkout.

Adapted from Embedded Computing for High Performance, by João Cardoso, José Gabriel Coutinho, Pedro Diniz.

By João Cardoso, José Gabriel Coutinho, and Pedro Diniz

Dynamic voltage and frequency scaling (DVFS) is a technique that aims at reducing the dynamic power consumption by dynamically adjusting voltage and frequency of a CPU [33]. This technique exploits the fact that CPUs have discrete frequency and voltage settings as previously described. These frequency/voltage settings depend on the CPU and it is common to have ten or less clock frequencies available as operating points. Changing the CPU to a frequency-voltage pair (also known as a CPU frequency/voltage state) is accomplished by sequentially stepping up or down through each adjacent pair. It is not common to allow a processor to make transitions between any two nonadjacent frequency/voltage pairs.


Power dissipation can be monitored by measuring the current drawn from the power supply to the system or to each device. There are specific boards providing this kind of measurements but this scheme requires access to the power rails for the inclusion of a shunt resistor from the Vcc supplied and the device/system under measurement (note that P = Vcc x Icc). This is typically a problem and only useful in certain conditions or environments. Another possibility is to use pass-through power meters as the ones provided for USB interfaces.

Some computing systems provide built-in current sensors and the possibility to acquire from the software side the power dissipated. Examples of this are the support provided by the ODROID- XU3,a which includes four current/voltage sensors to measure the power dissipation of the ARM Cortex big.LITTLE A15 and A7 cores, GPU and DRAM individually, and the NVIDIA Management Library (NVML)b which allows to report the current power draw in some of their GPU cards.

By measuring the average current and knowing the voltage supply we can derive the average power dissipated and the energy consumed during a specific execution period.

A software power model based on hardware sensing is used in the Running Average Power Limit (RAPL)c driver provided for Intel microprocessors since the Sandy Bridge microarchitecture.d

The measurements are collected via a model-specific microprocessor register. Recent versions of platform-independent libraries such as the performance API (PAPI)e also include support for RAPL and NVML-based power and energy readings in addition to the runtime performance measurements based on hardware counters of the microprocessors. Monitoring power in mobile devices can be done by specific support such as the one provided by PowerTutorf in the context of Android-based mobile platforms. One important aspect of monitoring power dissipation is the power sampling rate (i.e., the maximum rate possible to measure power) which can be too low in some contexts/systems. Finally, other possibilities for measuring power and energy are the use of power/energy models for a certain platform and application and/or the use of simulators with capability to report estimations of the power dissipated.

a ODROID-XU3. http://www.hardkernel.com/.
b NVIDIA Management Library (NVML)—Reference manual, NVIDIA Corporation, March 2014, TRM-06719-001 vR331. https://developer.nvidia.com/nvidia-management-library-nvml.
c Intel Corp. Intel 64 and IA-32 architectures software developer’s manual, vol. 3B: System Programming Guide, Part 2, September 2016.
d Intel Corp. Intel Xeon processor. http://www.intel.com/xeon, 2012.
e Weaver VM, Terpstra D, McCraw H, Johnson M, Kasichayanula K, Ralph J, et al. PAPI 5: measuring power, energy, and the cloud. In: IEEE Int’l symposium on performance analysis of systems and software; April 2013.
f PowerTutor: A power monitor for android-based mobile platforms, http://ziyang.eecs.umich.edu/projects/ powertutor/.

Dynamic frequency scaling (DFS) and dynamic voltage scaling (DVS) are techniques to reduce the power dissipation when voltage and frequency ranges are not fully interdependent, i.e., when changes in clock frequency do not imply (up to a certain point) changes in the supply voltage and vice versa. Decreasing the clock frequency without changing the supply voltage (possibly maintaining it to the level needed to operate at the maximum clock frequency) implies a decrease of power dissipation but may lead to insignificant changes in energy consumption (theoretically we would expect the same energy consumption). Decreasing the supply voltage without changing the operating frequency implies both power and energy reductions.

The DVFS technique can be seen as a combination of DFS and DVS and when the interdependence between power supply and operating frequency is managed in a global way. However, in CPUs where the voltage-frequency interdependence exists, DFS, DVS, and DVFS are often used with the same meaning, i.e., the dynamic scaling of voltage-frequency.


The end of Dennard scaling [34], which argued that one could continue to decrease the transistor feature size and voltage while keeping the power density constant, has raised a big challenge for large transistor count IC designs. At the core of the issue of power density is the fact that with the growing number of increasingly smaller transistors, the aggregate leakage current, if unchecked, is large enough to create the threat of thermal runaway. This is particularly serious in devices with many cores where the execution of all the cores at maximum or acceptable speed is unfeasible.

To cope with this issue, ICs may have resorted to “Dark Silicon” [35] techniques that under-power or under-clock regions of an IC whenever they are not being used. To support these techniques, ICs have to provide low-level mechanisms that allow the monitoring of the thermal conditions of specific regions of the IC, e.g., of a coprocessor or hardware accelerator and provide an interface with which a runtime environment or a scheduler can reduce the associated clock rate or even temporarily power down that unit for the sake of power dissipation. The impact on the ability of compilers to statically schedule the execution of selected computations on such devices is substantial. Execution predictability and hence nonfunctional requirements guarantees such as latency and throughput are, in this context, harder to ensure. Another possibility is to map and schedule the computations at runtime using OS, middleware, or application-level support.

The next installment in this series discusses factors that arise in comparing results of the methods described previously.   

Reprinted with permission from Elsevier/Morgan Kaufmann, Copyright © 2017

João Manuel Paiva Cardoso , Associate Professor, Department of Informatics Engineering (DEI), Faculty of Engineering, University of Porto, Portugal. Previously I was Assistant Professor in the Department of Computer Science and Engineering, Instituto Superior Técnico (IST), Technical University of Lisbon (UTL), in Lisbon (April 4, 2006- Sept. 3, 2008), and Assistant Professor (2001-2006) in the Department of Electronics and Informatics Engineering (DEEI), Faculty of Sciences and Technology, at the University of Algarve, and Teaching Assistant in the same university (1993-2001). I have been a senior researcher at INESC-ID (Systems and Computer Engineering Institute) in Lisbon. I was member of INESC-ID from 1994 to 2009.

José Gabriel de Figueiredo Coutinho , Research Associate, Imperial College. He is involved in the EU FP7 HARNESS project to intergrate heterogeneous hardware and network technologies into data centre platforms, to vastly increase performance, reduce energy consumption, and lower cost profiles for important and high-value cloud applications such as real-time business analytics and the geosciences. His research interests include database functionality on heterogeneous systems, cloud computing resource management, and performance-driven mapping strategies.

Pedro C. Diniz received his M.Sc. in Electrical and Computer Engineering from the Technical University in Lisbon, Portugal and his Ph.D. from the University of California, Santa Barbara in Computer Science in 1997. Since 1997 he has been a researcher with the University of Southern California’s Information Sciences Institute (USC/ISI) and an Assistant Professor of Computer Science at the University of Southern California in Los Angeles, California. He has lead and participated in many research projects funded by the U.S. government and the European Union (UE) and has authored or co-authored many internationally recognized scientific journal papers and over 100 international conference papers. Over the years he has been heavily involved in the scientific community in the area of high-performance computing, reconfigurable and field-programmable computing.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.