A new way to benchmark energy costs of embedded processor performance - Embedded.com

A new way to benchmark energy costs of embedded processor performance

The issues associated with energy requirements of devices used in theconsumer and industrial markets have come to the forefront of systemdesign. Handheld embedded systems strive to maximize performance andfeatures while simultaneously consuming modest amounts of batteryenergy. And the problem isn't limited to portable electronics.

Designers of high-performance systems must also grapple with thechallenges of reducing power to address a different class of issuesassociated with space constraints, cooling, and the need to meet EnergyStar specifications.

Many processor vendors offer their own energy consumptionspecifications on product data sheets that are difficult to comparewith one another. When design engineers attempt to compare processorcores that include system-on-chip implementations, interpreting thesevalues becomes even more difficult. Vendors also use typical powernumbers to characterize their processors. But only rarely do theyindicate the workload that was applied while making these measurements.

Setting Standards
The Embedded Processor Benchmark Consortium, EEMBC , is a non-profitorganization that has established itself as the recognized source forstandardized embedded processorbenchmarks.

This article is excerpted from a paper ofthe same name presented atthe Embedded Systems Conference Boston 2006. Used with permission ofthe Embedded Systems Conference. For more information, please visit www.embedded.com/esc/boston/

Traditionally, EEMBC focused on the performance aspect of processorbehavior, developing benchmarks that represent the real-world aspectsof embedded applications such as automotive, consumer, networking,office automation, and telecommunications. With the increasingimportance of power and energy in embedded applications, however, theorganization realized the need to establish energy consumption as aparallel metric that would accompany the performance values.

The challenge faced by this standards organization lies in theability to derive methods that can be generically applied by all users.Furthermore, since it is important for EEMBC to be able to certify andverify the repeatability of all performance and power measurements, themethods used must comply with a common set of criteria. The ultimategoal is to help system designers to make informed tradeoffs betweenperformance and power in portable and space-constrained applications.

The methodology developed by EEMBC to make this possible is EnergyBench , a benchmarksoftware utility that provides practical data on the amount of energy aprocessor consumes when running a real application workload.

Designers can use EnergyBench in conjunction with EEMBC'sperformance benchmarks to determine how efficiently various processorsuse energy while carrying out a series of standardized,application-focused tasks. By using a standard metric for energyconsumption that is directly tied to a standard set of performancetests, designers can compare the fit their needs for a givenapplication and energy budget.

Yet even when EnergyBench is used to look at the power consumptionof a single device, it becomes apparent that there is no such thing as”typical power,” since significant variations are seen in the averagepower when running each of the EEMBC benchmarks. EEMBC provides a widerange of performance benchmarks targeting different embedded segmentsto answer this issue. EnergyBench does not specify typical power, buttypical energy consumption for a specific algorithm or application, ata specific performance level.

EEMBC has implemented EnergyBench using the LabVIEWplatform and a data acquisition (DAQ) card, both from National Instruments. Using a DAQ cardaccommodates multiple differential measurement channels allowing energymeasurements on multiple power input rails simultaneously (eachmeasurement requires the capture of voltage and current) plus a triggerchannel.

EnergyBench uses the DAQ card to sample the voltage levels as wellas a trigger channel and write all samples to a file. A flexibletrigger mechanism accomplishes the synchronization between theperformance benchmark run and the power measurements.

This ensures that the energy measurements are made within the timedportion of the benchmark code, without including energy consumptionduring the benchmark initialization or record keeping phases. TheEnergyBench sampling module (Figure 1,below )accepts a configuration file that defines the triggermechanism by specifying voltage levels for trigger detection, as wellas voltage levels for the voltage and current channels.

The goals of EnergyBench
When running the benchmarks and acquiring energy samples, it'simportant to ensure that the results are reliable, repeatable, andconsistent, especially in the context of an industry standard. Thereare several methods EnergyBench utilizes to achieve these goals:

1. Reliability: Normally, to achieve statistically accurate results, samples must betaken at 2X the Nyquist frequency or higher, or they can be taken atrandom points. The EnergyBench sampling module accepts as an input thesampling frequency. The module must then be called several times withdifferent sampling frequencies.

Sampling multiple times during the benchmark run using unaliasedfrequencies yields sampling points that avoid any resonance with thebenchmark execution. In other words, assuming that each benchmarkiteration roughly occurs at periodic intervals, using a frequency whichis not aliased to the period ensures samples at pseudo-random points ineach iteration. This method is simple to implement and guaranteesstatistically accurate results.

Using this flexible method allows easy detection of a frequencywhich is aliased to the benchmark period, as that will cause adifferent result in one of the sampling frequencies. If such a case isdetected, a new set of unaliased frequencies is chosen, and the processis repeated until valid results are achieved.

2. Consistency: Since we can repeat the process as many times as we need, and increasethe sampling frequency, EnergyBench collects as many samples as neededuntil the average energy consumption can be determined with statisticalaccuracy. If the deviation of energy per iteration is too big, thesampling frequency is increased to improve accuracy and reduce thedeviation.

3.Repeatability: For certification purposes, the process isrepeated multiple times, and the standard deviation of the final resultis calculated. Any deviation can easily be detected since each run ofeach benchmark produces one number for the average energy per iterationof the benchmark.

Figure1. The EnergyBench sampling module can be configured via a friendly GUIor from a configuration file. All relevant parameters such as voltagelevels, resistor values and sampling frequency can be configured. Anoptional scope -like graphical display of captured signals showscurrent, voltage, and trigger channels.

Of course, the ability to generalize on the basis of any test put toa given device assumes that the target device is representative of avendor's product yield, and EEMBC has always had strict rules againstcherry-picking the devices submitted for certification.

By the same token, process variation is a problem that allsemiconductor manufacturers must deal with constantly, and one of themany potential applications for EnergyBench is to help manufacturersunderstand in more detail the specific components and effects ofprocess variation as they relate to energy consumption.

Figure2. Once all the samples have been captured, the analysis modulecalculates the energy per iteration of the benchmark. All of theparameters are fed in automatically using the EEMBC test harness.

Using EnergyBench
As shown in Figure 2, above ,after the benchmark finishes running multiple iterations and allmeasurement samples have been captured, the analysis module calculatesthe average energy that was consumed for every iteration of thebenchmark. The EEMBC Power Analysis Module analyzes the capturedsamples, determines the average energy used per iteration of thebenchmark, and looks for the minimum and maximum power samples.

If the variation within a specific sampling frequency is too large,the user can increase the frequency and/or the number of benchmarkiterations until there are enough samples as described above so thatthe confidence interval of the mean value is within the specifiedtolerance of 95%.

The ultimate result of the EnergyBench test is the average energyconsumed for one iteration of the workload represented by the benchmarkrunning on the target device. An EEMBC-certifiedEnergymark score is an optional metric that a devicemanufacturer may choose to supply in conjunction with certified scoresfor device performance as a way of indicating a processor's efficientuse of energy.

A schematic of this process is shown below in Figure 3, below . The results aredisplayed in the power analysis module in the energy/iteration chart. Adisplay also shows the number of iterations that have been analyzedwith respect to energy/iteration (Figure 2 earlier). Users can also usethe EEMBC setup to examine minimum and maximum power while thebenchmark is running, and the variance of the captured samples.

Figure3. The EnergyBench process will tie typical energy with specificbenchmark, and more than that ” with specific workload of thatbenchmark.

The EnergyBench specification indicates a device warm- up period ofat least 30 minutes and an ambient temperature of 70°F +/- 5°F.These baseline conditions are very important to ensure consistentresults. Furthermore, it has been demonstrated that the energyconsumption can increase dramatically as the device temperatureincreases.

The DAQ card allows, and the EnergyBench specification requires, allpower rails on the processor to be measured. EnergyBench's Test Harnessincludes executables for simultaneously measuring one, two, or threerails. With processors implemented with more than one power rail (i.e.core power and I/O power), there are two methods for calculating theenergy per iteration of the benchmark.

Using the first method, EnergyBench uses the DAQ card tosimultaneously measure up to three rails. However, using this methodand because all channels are sampled at the same rate, the samplingrate of the DAQ card may need to be decreased to match the hostmachine's ability to keep up with the sampling (too much data coming inat once). Alternatively, rails may be measured separately, with the sumof the average energies of each individual rail equaling the totalcumulative energy consumption.

Which method to use?
How does one determine which method to use? First of all, someprocessors have more than three power rails. Even if three rails werebeing measured simultaneously, this would still require some rails tobe measured separately, or use a DAQ card with more input channels.

In addition, the sampling rate should be relative to the processor'soperating frequency to allow sufficient sampling during each benchmarkiteration. To accommodate a multi-GHz processor, the sampling rate mayneed to be so high that the host PC can only keep up with one rail at atime.

To provide some insight on the methodology, we considered manyalternatives, such as specifying junction temperature for energymeasurements, using high frequency scopes and highly controlledenvironment.

However, since we are not trying to characterize parts but truly tofind out typical energy consumption, we have decided on readilyavailable hardware and controlling the ambient temperature, rather thenjunction or case temperature.

Another issue was a process that needs to scale from 5 MHzmicrocontrollers to fastest processors that are in the market today.Being able to replicate the process at multiple sites to be able toindependently certify results was also a concern.

Using a programmable DAQ, we can easily specify parameters such assampling frequency, and yet retain all captured data in permanent form.In Figure 4 below , you canfind a sample of the code that operates behind the scenes to enable themethodology. This code was written in LABView, and continuously writescollected samples to a file, until a configurable signal is detected onthe trigger channel.

The code can optionally display all captured signals, and in fact ispart of the code driving the GUI at Figure1 earlier. All relevant parameters such as voltage levels,resistor values and sampling frequency can be configured. An optionalscope-like graphical display of captured signals shows current voltageand trigger channels. In particular, Figure1 shows the state of the GUI when this loop has detected atrigger signal, and is about to quit.

Figure4. DAQ code for the sampling loop.

To summarize, current figures of typical power do not rely on astandard process or a standard set or workloads. EnergyBench is asimple and flexible process that achieves the following goals:

1. A standard process formeasuring average energy consumption for a specific workload.
2. A standard set of embeddedworkloads to measure typical energy on.EnergyBench provides several tools that can be used in conjunction withreadily available and affordable hardware to measure typical energyconsumption, using the standard methodology developed by EEMBC.

Markus Levy is president and Shay Gal-On is Director of SoftwareEngineering at the Embedded Processor Benchmark Consortium ( EEMBC).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.