How to choose the benchmarks appropriate to the system you're designing -

How to choose the benchmarks appropriate to the system you’re designing

Would you judge a car's fuel efficiency by how much hay it eats? Wouldyou evaluate a CD player by its ability to play cassette tapes? Orwould you test a PC using benchmarks from five years ago?

If you answered “yes” to any of the questions, you should probablyrethink your answer, and definitely update your tests.

Computer hardware continues to improve in leaps and bounds. Today,there are multicore processorsand multiple graphics engines that deliver capabilities barelyimaginable a few years ago.

New software and usage models evolve in fits and starts. Likewise,the method of testing computers in this changing environment needs toevolve.

In other words, to get the most effective understanding of yourcomputer's performance and capabilities, you need to regularly updateyour understanding of how you are using the computer and the availabletools or benchmarks.

Are you keeping up?
Benchmarksare the primary tool used for evaluating computer performance. Ingeneral terms, a benchmark is a method by which something can bemeasured or judged.

In the case of computers, the system undertest (SUT) is being judged. Examples of benchmarks includemeasuring a computer's ability to do a specific amount of work, wherefaster execution time is rewarded with a higher score.

Beyond computers, examples include fuel efficiency, measured inkm/liter, sports statistics such as baseball's batting average, or evena student's grade point average.

Results can be relative to known criteria, where a certainperformance is assigned a score. In another example, a computer plays avideo acceptably or not, depending on whether the playback is smooth orframes are dropped.

Meanings are provided in terms of comparison of the computers'scores. With this type of information, you can make informed decisionsand selections.

Thus, computer evaluations frequently use benchmarks. You wouldideally want to test your own applications on all systems of interest.

This approach, however, can be prohibitive. It requires time andmoney to acquire all the hardware of interest, and to research andcreate the workloads for evaluation use. Researching and creating theworkloads can be particularly difficult, as it may be hard to know inadvance how a particular computer will be used.

Benchmarks provide a means of bridging this gap. Many are availablefrom third parties and have taken steps to make installation andexecution easy. In some cases, there may also be databases of resultsavailable, potentially removing the need to actually run the testsyourself. By using these resources, you can save a considerable amountof time and/or money.

Pick and choose
But how do you choose a benchmark? How do you know if it is good ornot? You should consider the benchmark's relevance first. This requiresunderstanding two things: how are you planning on using the computer,and how does the benchmark you are considering relate to that?

The ideal benchmark uses the applications and performs theoperations you need. If this is not the case, you have to assess howrepresentative the benchmark is of your needs.

If you want to evaluate a system's capability as a multimediamachine for a consumer's home, a benchmark calculating the digits of piisn't going to provide as much insight as a benchmark with some measureof video stream recording and playback.

In addition to general relevance to your usage model of interest,here are several other considerations to factor into your test choice:

Is theworkload recognized and accepted? This provides an additionalindication that others find the benchmark relevant. It also increasesthe probability of a larger body of data available on the benchmark andon machines running the benchmark.

Is theworkload portable? This provides information as to whether theworkload can run in the environments of interest. Portability acrossoperating systems is often one of the more challenging items to addressas well as maintain relevance and comparability with.

Is theworkload scaleable? And if so, in what manner? An idealbenchmark demonstrates a difference across a wide range of platforms,yielding a broad base of results.

It is also important to understand what hardware evolution improvesperformance. For example, databases tests tend to involve large amountsof I/O.

Thus, memory and disk performance can be as important as processorperformance when evaluating configurations for that environment.

Note that the popularity of a benchmark doesn't necessarily meanthat all of these attributes have been met.

Other considerations
There may be occasions when the systems you are evaluating will havemultiple uses or be deployed in a general-purpose environment.

In this case, it may not be possible or appropriate to summarizeperformance as a single number with a single benchmark.

When this happens, you need to consider benchmarks as tools and usethe appropriate ones. No single tool can be used for everything, andeach tool has its special function.

For example, in the home client environment, a more completeperformance picture needs to encompass office-productivity performance,media-creation performance and gaming performance. A set of diverseusages needs multiple benchmarks.

You should also be aware that yesterday's benchmark of choice maynot be appropriate today. Technology may render some usage models, butall are irrelevant.

For example, JPEG compression and decompression once requirednoticeable compute power, but this is no longer the case today. Theflip side is that technology may enable new workloads that you had notconsidered before.

Technology evolves, and so must the tests used to measure it. PCsand consumer electronics devices are getting faster, moreenergy-efficient and more capable, which in turn, enables newer usagemodels.

To capture all that these new devices can deliver, a meaningfulplatform evaluation needs to align with the system's intended usagemodel. Hence, be mindful of how you use your computers now and in thefuture, and test accordingly.

Benchmark standards organizations
There are numerous benchmarking Websites on the Internet. The BusinessApplications Performance Corp. (BAPCo) is a nonprofit consortium thatdistributes a set of objective performance benchmarks for PCs based onpopular software applications and operating systems. SYSmark andMobileMark are two of the benchmarks produced by BAPCo.

Meanwhile, the Embedded Microprocessor Benchmark Consortium (EEMBC)develops meaningful performance benchmarks for the hardware andsoftware used in embedded systems. EEMBC  focuses on many differentusage models, such as automotive, networking and telecom usage models.

FuturemarkCorp. aims to support the development of the computer and handhelddevice industry. It creates and maintains objective standards ofcomputer performance measurement. This is done in cooperation withleading hardware and technology companies. 3Dmark, PCmark and SPMarkare some benchmarks produced by Futuremark.

The StandardPerformance Evaluation Corp. (SPEC) is a nonprofit corporationformed to establish, maintain and endorse a standardized set ofrelevant benchmarks that can be applied to the newest generation ofhigh performance computers. SPEC produced SPEC CPU, SPECviewperf andSPEC HPC benchmarks.

Another non-profit corporation is Transaction Processing Performance Council(TPC). It was founded to define transaction processing anddatabase benchmarks to disseminate objective, verifiable TPDperformance data to the industry. TPC produced the TPC-C and TPC-Hbenchmarks.

Jeff Reilly is principal engineer,and David Salvator is evangelist, worldwide client capability in thePerformance Benchmarks & Competitive Analysis Group at Intel Corp.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.