The definitive guide to ARM Cortex-M0/M0+: Benchmarking low-power devices -

The definitive guide to ARM Cortex-M0/M0+: Benchmarking low-power devices


Editor's Note: In designing deeply embedded systems, engineers face ongoing tradeoffs between power and performance. The rapid emergence of opportunities for personal electronics, wearables and Internet of Things (IoT) applications only exacerbates challenges in reducing power while enhancing performance. The ARM Cortex-M0 and Cortex-M0+ processors have emerged as a leading solution, providing the core for a broad range of microcontrollers designed to meet tough requirements for low-power, high-performance operation. In The Definitive Guide to ARM® Cortex®-M0 and Cortex-M0+ Processors, 2nd Edition, Joseph Yiu offers a comprehensive view of these processors. As Jack Gannsle wrote, these books will “…give you the insight you need to be productive on real projects.”

CHAPTER 19. Ultralow-Power Designs (Cont.)  

19.6 Benchmarking of Low-Power Devices

19.6.1 Background of ULPBench™

Currently, most microcontroller vendors describe the low-power characteristic of their products by quoting active current and idle current. However, as highlighted in the beginning of Section 19.2, this is no longer enough for designers. As there were no standardized rules of how active current should be measured, some of the quoted active current from microcontroller vendors could be controversial because:

  • The data can be obtained by running “while(1) ”—the instruction could be fetched from a prefetch buffer and therefore no memory access activity in the flash and SRAM.

  • The data can be obtained by running program code from SRAM, with the flash memory turned off.

  • The data can be obtained by running program with wait states for flash memory enabled. This reduces the signal toggling and therefore reduces power.

  • The test could be carried out with a voltage supply that is only suitable for labs environment and is not suitable for real-world applications.

As a result, there is a need to come up a standardize way to demonstrate energy efficiency in low-power microcontroller devices.

Although it is possible to use existing benchmark code like EEMBC® CoreMark® as a reference for measuring power, the data processing complexity of CoreMark is somewhat overkill for a lot of the ULP applications. On the other hand, Dhrystone is too small to illustrate processing requirements and therefore is not suitable either.

There is also the need to demonstrate the sleep mode current. If the program execution is too long, the active power will dominate the test result.

As a result, the EEMBC ULPBench workgroup was formed in 2012. The aim of the
work group is to create benchmark suites that are suitable for measuring energy efficiency of low-power and ULP microcontroller devices, with a consistent and well-defined method.

The ULPBench project is divided into multiple phases. The first phase focuses on the energy efficiency of the processors inside the microcontroller, and is named ULPBench- Core Profile (or ULPBench-CP). Currently, additional profiles are being discussed and investigated in the EEMBC ULPBench workgroup.

19.6.2 Overview of the ULPBench-CP

The score of the ULBench-CP is to measure the energy efficiency of ULP microcontroller devices, including 8-, 16-, and 32-bit devices. Unlike traditional benchmarks, the ULPBench needs a piece of hardware to measure the actual energy consumption by a device. Therefore the ULPBench-CP has defined:

  • A workload (in C language) that can be used on 8-, 16-, and 32-bit architectures,

  • A reference energy measurement hardware, called the EnergyMonitor,

  • A Windows-based GUI to access the measurement hardware and control the test process and to display and compute the results.

In order to reflect the work load pattern of real-world applications, the workload executes a workload once every second and enter sleep mode the rest of the time (Figure 19.5).

Figure 19.5. Processor activities in ULPBench-CP execution.

The measurement process spans 10 occurrences of the processing. In order to ensure the data is accurate, 12 occurrences of the processing are needed and the software controlling the test detects the middle 10 occurrences and uses them for calculation of benchmark result.

The workload contains data processing functions including:

  • Data processing of 8-, 16-, and 32-bit data types,

  • Control functions (7-segment LCD),

  • Sorting,

  • String functions,

  • Task scheduling.

A simple task scheduler is included as part of the workload, but no actual context switching takes place because such operation is not supported by a number of 8-bit microcontrollers targeting ULP applications.

On existing Cortex®-M0, Cortex-M0+, Cortex-M3, and Cortex-M4 processors, the execution time of the workload takes around 10 14 k clock cycles. So if you wish, you can execute the workload with an on-chip 32-KHz crystal provided it has the required accuracy (±50 ppm).

To support the measurement setup, EEMBC provides a reference hardware tool called EnergyMonitor that you can buy from EEMBC Web site, and a software running on a personal computer to collect the data from EnergyMonitor and compute the result. The EnergyMonitor hardware is shown in Figure 19.6.

Figure 19.6. EEMBC Energy Monitor.

The Energy Monitor receives the power from the USB connector, and supplies the power to the DUT (Device Under Test) using jumper connector (Figure 19.7).

Figure 19.7. ULPBench-CP test setup.

Continue to page 2 >>

Some software porting work is required to get the ULPBench-CP working on a microcontroller. ARM® has already contributed a template for the Cortex-M processors, but software developers need to add device-specific low-power feature support code, and might need to port the timer code to use device-specific low-power timer instead of the generic SysTick timer for best results. Also, some I/O control functions are defined in ULPbench-CP to indicate that the system is indeed running ULPBench-CP correctly (signal toggling can be observed with an oscilloscope). These functions also need to be ported.

After the software porting work is done, we can then test the ULPBench-CP with the ULPBench EnergyMonitor software. The measurement process is repeated a number of times before the score were computed. The result can then be optionally uploaded to the EEMBC Web site for display. Figure 19.8 shows the ULPBench-CP test result of a STM32L476, a microcontroller with Cortex-M4 with FPU processor, 1 MB on-chip flash and 12 KB of SRAM, which has an impressive official score of 123.5 ULPMark™-CP. Additional ULPBench-CP scores can be found on the EEMBC Web site.

Figure 19.8. ULPBench Energy Monitor GUI.

Unlike traditional power measurement tools, the EnergyMonitor essentially measures the charging time of a capacitor which supplies current to the device under test. Unlike ADC sampling, this method provides higher accuracy by avoiding any error stemming from current spikes between samples.

In order to make sure the test provides a fair and equitable comparison, the measurement setup has a number of requirements:

  • The supply voltage is 3 V.

  • The wake-up timer must be accurate (within ±50 ppm).

  • The program code must run from the microcontroller’s flash memory (or NVM).

The benchmark result is represented as ULPMark-CP = 1000/(median of 5 times average energy per second for 10 ULPBench cycles). The energy is measure in microjoules.

19.7 Example of Using Low-Power Features on Freescale KL25Z

19.7.1 Objective

The aim of this test example is to generate a 1-Hz- period interrupt to output a message via the UART interface, and have the processor put in low-power mode to reduce the overall current as much as possible.

In this example, we assume that the timing of the wake-up event needs to be very accurate. As a result, we use the external crystal for the clock source during operation.

19.7.2 Test Setup

The test is based on the Freescale Freedom board (FRDM-KL25Z). In this development board, you can do a small modification so that you can measure the electric supply current going into the microcontroller by connecting an ammeter across jumper J4 (Figure 19.9).

  • If you are using REV-D of the FRDM-KL25Z, there is a solder shorter right under jumper J4 that you need to cut out.

  • If you are using REV-E of the FRDM-KL25Z, there are two resistors connected across J4 and both are placed next to J4: a 0 Ω (R73) and a 10 Ω (R81). If you want to measure the current using an ammeter, you should desolder both of them. Alternatively you can remove just the 0-Ω resister and measure the current using a voltmeter.

Figure 19.9. Jumper J4 on the FRDM-KL25z Board.

After doing the modification, you can put the board back into normal operations again by putting a jumper header on jumper J4. In case you want to find out more about the differences between the REV D and REV E of the Freedom board, Erich Styger wrote a very good blog about this which can be found in frdm-kl25z-reve-board-arrived/.

19.7.3 Low-Power Modes on KL25Z

The KL25Z128VL microcontroller device supports a number of power modes, as shown in Figure 19.10.

Figure 19.10. Power modes in KL25Z microcontrollers.

In this example, we use the VLPS (Very Low Power Stop) mode. Alternatively LLS (Low Leakage Stop) could be used but the UART will be stopped during sleep. If the processor entered sleep before the UART transmission completed, the output UART data could be corrupted.

The selection of the operation mode is handled by a unit called System Mode Controller (SMC).

19.7.4 Clocking Arrangement

The clock generation involved several components, as illustrated in Figure 19.11.

Figure 19.11. Clocking diagram from Freescale KL25 Subfamily Reference Manual (KL25P80M48SF0RM, rev3).

This included:

  • System oscillator—This can be configured for high-speed crystal operation or low- power 32 KHz operation. On the Freescale Freedom Board, the system oscillator is connected to an external 8-MHz crystal.

  • Multipurpose Clock Generator (MCG)dThis unit contains the internal RC oscillators (4 MHz and 32 KHz), a Frequency Locked Loop (FLL) and a Phase Locked Loop (PLL). The FLL and PLL can utilize the clock generated from the System Oscillator.

  • System Integration Module (SIM)dThis unit provides various clock multiplexing/ routing/prescaling options, as well as controls the clocks to peripherals.

  • Power Management Controller (PMC)dThis unit contains the internal voltage regu- lator, power on reset (POR), and low-voltage detect system. (Not used in this example.)

  • Real Time Clock (RTC)dGenerate Timer interrupts and a 1-Hz clock. (Not used in this example.)

Instead of using RTC for the 1-Hz interrupt generation, we use the LPTMR (Low-Power Timer) because the external crystal is connected to an 8-MHz crystal. The RTC works best with an external 32-KHz crystal.

To make things slightly more challenging, software developers also need to understand the operation states of the MCG (Figure 19.12).

Figure 19.12. Multipurpose Clock Generator operating states.

In our example, the system starts-up in FEI state, then switches the FBE state, and then switches to BLPE state. The switching of the operation states are done inside the “SystemInit() ” function at start-up.

Stay tuned for the next installment: How to Setup Benchmarking of Low-Power Devices

About the author
Joseph Yiu is a Senior Embedded Technology Specialist at ARM Ltd. in Cambridge, UK. He joined ARM in 2001 and has been involved in a wide range of projects including development of ARM Cortex-M processors and various on-chip system level and debug components. In addition to in-depth knowledge of the processors and microcontroller system design, Yiu also has extensive knowledge in related areas including software development for the ARM Cortex-M microcontrollers, FPGA development and System-on-Chip design technologies. He received his BEng. in Electronics Engineering from City University of Hong Kong and an MSc. In Microelectronics Systems Design from University of Southampton.

Copyright © 2016 Elsevier, Inc. All rights reserved.
Printed with permission from Newnes, a division of Elsevier. Copyright 2016. For more information on this title and other similar books, please visit

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.