Benchmarking an ARM-based SoC using Dhrystone: A VFT perspective
Calculating the SoC tester benchmarks
Finally, to determine the Dhrystone measurements it is necessary to perform the following calculations:
1. Measure the time on tester (signaled through specific pads for both cores) in seconds for the last n iterations.
2. Dhrystones/sec = 1/(time for last n loops/n).
This Dhrystones/sec metric is now typically converted into a DMIPS/MHz metric by dividing by two factors: the MHz speed of the processor and a constant that represents the performance of a DEC VAX 11/780 machine, which was widely viewed as a "1 MIPS" processor in the 1980s when the Dhrystone benchmark first appeared.
The widely quoted constant describing the VAX 11/780's performance is 1757 Dhrystones per second. Dhrystone MIPS are typically described as simply "DMIPS", and a DMIPS per MHz performance metric is often quoted, as below:
3. Dhrystone MIPS = Dhrystones/sec * (1/1757)
4. DMIPS per MHz = Dhrystone MIPS * (1/ freq of the CPU in MHz)
So the Dhrystone number for any MCU comes out as:
DMIPS/MHz =Some additional points to consider
(1 / Execution Time for last n loops ) * n * (1/1,757 ) * (1/MCU Frequency in MHz)
The inherent structure of the Dhrystone inner loop benchmark code and its disproportionate execution time in ASCII string functions (strcmp, strcpy) makes it very sensitive to compiler optimizations. This is because different compiler options can produce wildly varying performance metrics. Beyond the obvious sensitivities to compiler options, there are a number of other optimizations and considerations for best performance, including:
- Definite design port states. When running the tester pattern in an RTL simulation environment, take care that no design ports (especially the ones signaling the different simulation stage or those configured as input ports) are in an x state at any time in the run, otherwise unpredictable behavior will occur.
- Caches for both cores. For best performance, enable all the processor architectural features, such as the caches, etc. Depending on the SoC micro-architecture, the performance with caches disabled can be significantly less than the Dhrystone metrics quoted by suppliers.
- Semaphore in the cache-enabled mode. For a dual core configuration, a semaphore is needed for processor-to-processor communication to signal "execution complete" from the secondary core. Otherwise, With caches enabled and no special handling for the communication from the secondary core, the result indicator may not be visible to the primary processor. This condition would create a situation where the SoC enters an infinite loop waiting for the appropriate signal. The use of a non-cacheable semaphore for this processor-to-processor signaling solves the problem.
- Copy-back mode. In a configuration mode, performance is usually maximized by operating the data cache(s) in copyback (also known as writeback) mode. In copyback mode, all processor writes generate a transaction only to the data cache so that only cache lines naturally displaced or cache maintenance operations can force line writes to occur into the next level memory system. This typically generates considerably less system bus traffic and improved performance versus a write-through configuration where every processor write also generates a system bus write to update the next level memory system.
In a dual core configuration with caches enabled, it may be necessary to add code to explicitly perform cache maintenance operations to push selected variables into the next level memory system so it can be visible to the other processor core.


Loading comments... Write a comment