DSP floating point benchmarks get easier to measure - Embedded.com

DSP floating point benchmarks get easier to measure


Embedded developers who struggle with coming up with good floating point DSP benchmarks for apps including graphics, audio, motor control wireless base station, medical and mil-aero can breathe a sigh of relief.

For now, with the active participation of most of the major DSP, MCU and MPU vendors and developers, the Embedded Microprocessor Benchmark Consortium (EEMBC) has just released its new FPMark benchmark suite

“Until now, the industry has lacked a reliable, useful, and consistent floating-point benchmark,” said EEMBC president Markus Levy. “Just as our CoreMark benchmark was aimed at providing a ‘better Dhrystone,’ we think FPMark provides an extreme improvement over the easily manipulated – and misused -Whetstone and Linpack.”

According Levy, FPMark (Figure 1 below ) incorporates algorithms for measuring single (32 bit) and double (64 bit) precision workloads, as well as a mixture of small to large data sets to support microcontrollers to high-end processors, respectively.

Figure 1: FPMark is calculated by taking the geomean of all the individual scores and multiplying the result by 100 for scaling

Uniquely, he said, FPMark allows users to evaluate FPU performance on the basis of consistent and controlled data, delivering honest, reliable, and unbiased metrics to serve the needs of processor vendors, compiler vendors, and system developers.

“Basically, DSP and floating point benchmarks currently available are a mess,” he said, “a mix of synthetic and natural measures, the accuracy of which can be distorted by the type of compiler used, how it is used, what is measured, and how accurately it reflects real developers needs.

“In addition, the ways in which this important feedback to the developer was displayed ranged from the bare minimum or worse, to a surfeit of information, much of it irrelevant.”

The goal of the working group was to come with measurement framework that addresses all these issues, said Levy.

Previously fixed point DSP was widely used where speed and cost were important and floating point was relegated to the high end where accuracy was more important. Now with sophisticated multicore architectures incorporating floating point and doing so at a cost that matches even single core fixed point, the need for better measurement capabilities has become critical.

According to Luther Johnson, Microchip Principal Compiler Engineer in Microchip Technology's’ Development Systems Division and a member of the FPMark working group, using floating-point (FP) representation enables more accurate calculations of fractional values than fixed-point numbers (integers) because exponents allow the decimal point to shift.

Moreover, floating point math makes numerical computation much easier and many algorithms implemented with floating point take fewer cycles to execute than fixed-point code (assuming similar precision). To take advantage of this efficiency, many embedded processors include hardware floating-point units (FPUs) to support these higher levels of precision.

“But it is a nightmare to come up with a set of floating point benchmarks that are really useful,” he said.

The EEMBC FPMark Suite uses 10 diverse kernels to generate 53 workloads, each of which self-verify to ensure correct execution of the benchmark. “That alone would make FPmark unique,” said Johnson.

According to Levy, the FPmark, workloads are built on the same infrastructure as EEMBC’s MultiBench, which will allow a developer to multiple contexts and demonstrate multicore scalability, as well as greatly simplifying the effort required to port the benchmarks to bare-metal or implementations running Linux.

To make sure it accurately represents real world applications of embedded developers, he said the kernels in FPMark include a mixture of general-purpose algorithms (such as Fast Fourier Transform, linear algebra, ArcTan, Fourier coefficients, Horner’s method, and Black Scholes) and complex algorithms (such as a neural network routine, a ray tracer, and an enhanced version of Livermore Loops).

According to Linley Gwennap, president and principal analyst of The Linley Group, while many people have attempted to create a floating-point benchmark, “most do not comprehend the extra effort required to ensure that the workload executes comparably regardless of compiler or hardware used.”

“For example, it’s important that the FPMark was constructed in such a way to support advanced compiler optimizations, but not at the expense of optimizing away work that must be done during the execution of the benchmark.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.