Evaluating the performance of multi-core processors
Determining the specific multi-core processor that will suit your embedded application needs is a challenge. Relying upon the marketing collateral from a given company is not sufficient because in many cases, the results quoted are specific to a given platform and ambiguous application; there is no guarantee your application will exhibit the same performance.
Therefore, it is important to understand the strengths and weaknesses of common benchmark suites that are used to gauge single processor and multi-core processor performance. In addition, the increased focus on energy efficiency has led to the development of benchmark suites that gauge power performance; a description of these benchmark suites is included.
Finally, in Part 2, two practical examples of using benchmark results to estimate system and application behavior are discussed:
1) How to use benchmark data to characterize application performance on specific systems.
2) How to apply performance benchmark suites to assist in estimating performance of your application.
Single-core Performance Benchmark Suites
A number of single-core benchmark suites are available to assist embedded development engineers assess the performance of single-core processors. Before considering multicore processor performance on parallel applications, scalar application performance should be reviewed. Some popular benchmark suites commonly used to evaluate singlecore embedded processor performance are:
EEMBC Benchmark Suites
BDTI Benchmark Suites
SPEC CPU2000 and CPU2006
The Embedded Microprocessor Benchmark Consortium (EEMBC), a non-profi t, industry-standard organization, develops benchmark suites comprised of algorithms and applications common to embedded market segments and categorized by application area.
EEMBC benchmark suites covering the embedded market segments include:
Digital Entertainment 1.0
Office Automation 2.0
The EEMBC benchmark suites are well suited to estimating performance of a broad range of embedded processors and fl exible in the sense that results can be obtained early in the design cycle using functional simulators to gauge performance.
The benchmarks can be adapted to execute on bare metal embedded processors easily as well as execute on systems with COTS OSes. The classification of each application or algorithm into market segment specific benchmark suites make it easy for market specific views of performance information.
For example, a processor vendor targeting a processor for the automotive market segment can choose to measure and report Automotive 1.1 benchmark suite performance numbers. Executing the benchmark suites results in the calculation of a metric that gauges the execution time performance of the embedded system.
For example, the aggregate performance results from an execution of the networking benchmark suites is termed NetMark * and can be compared to the NetMark value obtained from an execution of the benchmark on different processors.
Additionally, the suites provide code size information for each benchmark, which is useful in comparing tradeoffs made between code optimization and size. Publicly disclosed benchmark suite results require certification by EEMBC, which involves inspection and reproduction of the performance results and which lend credibility to the measurements.
This is especially critical when the benchmark code is optimized to provide an implementation either in hardware, software, or both to maximize the potential of the processor subsystem. Access to the benchmark suite requires formal membership in the consortium, an academic license, or a commercial license. For further information on EEMBC, visit www.embc.org .
The BDTI Benchmark Suites focus on digital signal processing applications, such as video processing and physical-layer communications. One valuable feature of these suites is that they are applicable to an extremely broad range of processor architectures, and therefore enable comparisons between different classes of processors.
The BDTI benchmarks define the functionality and workload required to execute the benchmark, but do not dictate a particular implementation approach. The benchmark customer has the fl exibility of implementing the benchmark on any type of processor, in whatever way is natural for implementing that functionality on that processor.
The benchmark results developed by the customer are then independently verifi ed and certifi ed by BDTI. The rationale for this approach is that it is closer to the approach used by embedded developers.
Embedded system developers obtain source code for key functional portions and typically modify the code for best performance either by optimizing the software (e.g., using intrinsics) or offl oading some work to a coprocessor. For background on BDTI and their benchmark offerings, please consult the website, www.BDTI.com .
Standard Performance Evaluation Corporation (SPEC) CPU2006 is comprised of two components, CINT2006 and CFP2006, which focus on integer and fl oating point code application areas, respectively.
For embedded developers, CINT2006 is more relevant than CFP2006; however, portions of CFP2006 may be applicable to embedded developers focused on C and C ++ image and speech processing. CINT2006 is comprised of 9 C benchmarks, 3 C ++ benchmarks and cover application areas such as compression, optimization, artificial intelligence, and software tools.
System requirements are somewhat steep for an embedded multi-core processor requiring at least 1 GB of main memory and at least 8 GB of disk space. Overall SPEC CPU2006 and the recently retired SPEC CPU2000 provide good coverage of different application types.
Due to the longevity of the benchmark and availability of publicly available performance data, it is possible to create a model that estimates your application performance on new processor cores before you have access to the new processor. An example of this technique is detailed later in this chapter. For background on SPEC and their benchmark offerings, please consult the website, www.spec.org .