A new way to benchmark energy costs of embedded processor performance
The issues associated with energy requirements of devices used in the consumer and industrial markets have come to the forefront of system design. Handheld embedded systems strive to maximize performance and features while simultaneously consuming modest amounts of battery energy. And the problem isn't limited to portable electronics.Designers of high-performance systems must also grapple with the challenges of reducing power to address a different class of issues associated with space constraints, cooling, and the need to meet Energy Star specifications.
Many processor vendors offer their own energy consumption specifications on product data sheets that are difficult to compare with one another. When design engineers attempt to compare processor cores that include system-on-chip implementations, interpreting these values becomes even more difficult. Vendors also use typical power numbers to characterize their processors. But only rarely do they indicate the workload that was applied while making these measurements.
Setting Standards
The Embedded Processor Benchmark Consortium, EEMBC, is a non-profit
organization that has established itself as the recognized source for
standardized embedded processor
benchmarks.
| This article is excerpted from a paper of the same name presented at the Embedded Systems Conference Boston 2006. Used with permission of the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/boston/ |
Traditionally, EEMBC focused on the performance aspect of processor behavior, developing benchmarks that represent the real-world aspects of embedded applications such as automotive, consumer, networking, office automation, and telecommunications. With the increasing importance of power and energy in embedded applications, however, the organization realized the need to establish energy consumption as a parallel metric that would accompany the performance values.
The challenge faced by this standards organization lies in the ability to derive methods that can be generically applied by all users. Furthermore, since it is important for EEMBC to be able to certify and verify the repeatability of all performance and power measurements, the methods used must comply with a common set of criteria. The ultimate goal is to help system designers to make informed tradeoffs between performance and power in portable and space-constrained applications.
The methodology developed by EEMBC to make this possible is EnergyBench, a benchmark software utility that provides practical data on the amount of energy a processor consumes when running a real application workload.
Designers can use EnergyBench in conjunction with EEMBC's performance benchmarks to determine how efficiently various processors use energy while carrying out a series of standardized, application-focused tasks. By using a standard metric for energy consumption that is directly tied to a standard set of performance tests, designers can compare the fit their needs for a given application and energy budget.
Yet even when EnergyBench is used to look at the power consumption of a single device, it becomes apparent that there is no such thing as "typical power," since significant variations are seen in the average power when running each of the EEMBC benchmarks. EEMBC provides a wide range of performance benchmarks targeting different embedded segments to answer this issue. EnergyBench does not specify typical power, but typical energy consumption for a specific algorithm or application, at a specific performance level.
EEMBC has implemented EnergyBench using the LabVIEW platform and a data acquisition (DAQ) card, both from National Instruments. Using a DAQ card accommodates multiple differential measurement channels allowing energy measurements on multiple power input rails simultaneously (each measurement requires the capture of voltage and current) plus a trigger channel.
EnergyBench uses the DAQ card to sample the voltage levels as well as a trigger channel and write all samples to a file. A flexible trigger mechanism accomplishes the synchronization between the performance benchmark run and the power measurements.
This ensures that the energy measurements are made within the timed portion of the benchmark code, without including energy consumption during the benchmark initialization or record keeping phases. The EnergyBench sampling module (Figure 1, below)accepts a configuration file that defines the trigger mechanism by specifying voltage levels for trigger detection, as well as voltage levels for the voltage and current channels.
The goals of EnergyBench
When running the benchmarks and acquiring energy samples, it's
important to ensure that the results are reliable, repeatable, and
consistent, especially in the context of an industry standard. There
are several methods EnergyBench utilizes to achieve these goals:
1. Reliability: Normally, to achieve statistically accurate results, samples must be taken at 2X the Nyquist frequency or higher, or they can be taken at random points. The EnergyBench sampling module accepts as an input the sampling frequency. The module must then be called several times with different sampling frequencies.
Sampling multiple times during the benchmark run using unaliased frequencies yields sampling points that avoid any resonance with the benchmark execution. In other words, assuming that each benchmark iteration roughly occurs at periodic intervals, using a frequency which is not aliased to the period ensures samples at pseudo-random points in each iteration. This method is simple to implement and guarantees statistically accurate results.
Using this flexible method allows easy detection of a frequency which is aliased to the benchmark period, as that will cause a different result in one of the sampling frequencies. If such a case is detected, a new set of unaliased frequencies is chosen, and the process is repeated until valid results are achieved.
2. Consistency: Since we can repeat the process as many times as we need, and increase the sampling frequency, EnergyBench collects as many samples as needed until the average energy consumption can be determined with statistical accuracy. If the deviation of energy per iteration is too big, the sampling frequency is increased to improve accuracy and reduce the deviation.
3. Repeatability: For certification purposes, the process is repeated multiple times, and the standard deviation of the final result is calculated. Any deviation can easily be detected since each run of each benchmark produces one number for the average energy per iteration of the benchmark.
![]() |
| Figure 1. The EnergyBench sampling module can be configured via a friendly GUI or from a configuration file. All relevant parameters such as voltage levels, resistor values and sampling frequency can be configured. An optional scope -like graphical display of captured signals shows current, voltage, and trigger channels. |
Of course, the ability to generalize on the basis of any test put to a given device assumes that the target device is representative of a vendor's product yield, and EEMBC has always had strict rules against cherry-picking the devices submitted for certification.
By the same token, process variation is a problem that all semiconductor manufacturers must deal with constantly, and one of the many potential applications for EnergyBench is to help manufacturers understand in more detail the specific components and effects of process variation as they relate to energy consumption.
![]() |
| Figure 2. Once all the samples have been captured, the analysis module calculates the energy per iteration of the benchmark. All of the parameters are fed in automatically using the EEMBC test harness. |
Using EnergyBench
As shown in Figure 2, above,
after the benchmark finishes running multiple iterations and all
measurement samples have been captured, the analysis module calculates
the average energy that was consumed for every iteration of the
benchmark. The EEMBC Power Analysis Module analyzes the captured
samples, determines the average energy used per iteration of the
benchmark, and looks for the minimum and maximum power samples.
If the variation within a specific sampling frequency is too large, the user can increase the frequency and/or the number of benchmark iterations until there are enough samples as described above so that the confidence interval of the mean value is within the specified tolerance of 95%.
The ultimate result of the EnergyBench test is the average energy consumed for one iteration of the workload represented by the benchmark running on the target device. An EEMBC-certified Energymark score is an optional metric that a device manufacturer may choose to supply in conjunction with certified scores for device performance as a way of indicating a processor's efficient use of energy.
A schematic of this process is shown below in Figure 3, below. The results are displayed in the power analysis module in the energy/iteration chart. A display also shows the number of iterations that have been analyzed with respect to energy/iteration (Figure 2 earlier). Users can also use the EEMBC setup to examine minimum and maximum power while the benchmark is running, and the variance of the captured samples.
![]() |
| Figure 3. The EnergyBench process will tie typical energy with specific benchmark, and more than that " with specific workload of that benchmark. |
The EnergyBench specification indicates a device warm- up period of at least 30 minutes and an ambient temperature of 70°F +/- 5°F. These baseline conditions are very important to ensure consistent results. Furthermore, it has been demonstrated that the energy consumption can increase dramatically as the device temperature increases.
The DAQ card allows, and the EnergyBench specification requires, all power rails on the processor to be measured. EnergyBench's Test Harness includes executables for simultaneously measuring one, two, or three rails. With processors implemented with more than one power rail (i.e. core power and I/O power), there are two methods for calculating the energy per iteration of the benchmark.
Using the first method, EnergyBench uses the DAQ card to simultaneously measure up to three rails. However, using this method and because all channels are sampled at the same rate, the sampling rate of the DAQ card may need to be decreased to match the host machine's ability to keep up with the sampling (too much data coming in at once). Alternatively, rails may be measured separately, with the sum of the average energies of each individual rail equaling the total cumulative energy consumption.
Which method to use?
How does one determine which method to use? First of all, some
processors have more than three power rails. Even if three rails were
being measured simultaneously, this would still require some rails to
be measured separately, or use a DAQ card with more input channels.
In addition, the sampling rate should be relative to the processor's operating frequency to allow sufficient sampling during each benchmark iteration. To accommodate a multi-GHz processor, the sampling rate may need to be so high that the host PC can only keep up with one rail at a time.
To provide some insight on the methodology, we considered many alternatives, such as specifying junction temperature for energy measurements, using high frequency scopes and highly controlled environment.
However, since we are not trying to characterize parts but truly to find out typical energy consumption, we have decided on readily available hardware and controlling the ambient temperature, rather then junction or case temperature.
Another issue was a process that needs to scale from 5 MHz microcontrollers to fastest processors that are in the market today. Being able to replicate the process at multiple sites to be able to independently certify results was also a concern.
Using a programmable DAQ, we can easily specify parameters such as sampling frequency, and yet retain all captured data in permanent form. In Figure 4 below, you can find a sample of the code that operates behind the scenes to enable the methodology. This code was written in LABView, and continuously writes collected samples to a file, until a configurable signal is detected on the trigger channel.
The code can optionally display all captured signals, and in fact is part of the code driving the GUI at Figure 1 earlier. All relevant parameters such as voltage levels, resistor values and sampling frequency can be configured. An optional scope-like graphical display of captured signals shows current voltage and trigger channels. In particular, Figure 1 shows the state of the GUI when this loop has detected a trigger signal, and is about to quit.
![]() |
| Figure 4. DAQ code for the sampling loop. |
Conclusion
To summarize, current figures of typical power do not rely on a
standard process or a standard set or workloads. EnergyBench is a
simple and flexible process that achieves the following goals:
1. A standard process for
measuring average energy consumption for a specific workload.
2. A standard set of embedded
workloads to measure typical energy on.
EnergyBench provides several tools that can be used in conjunction with
readily available and affordable hardware to measure typical energy
consumption, using the standard methodology developed by EEMBC.
Markus Levy is president and Shay Gal-On is Director of Software
Engineering at the Embedded Processor Benchmark Consortium ( EEMBC).






Loading comments... Write a comment