Design Con 2015

Software performance engineering for embedded systems: Part 2 – The importance of performance measurements

Robert Oshana, Freescale Semiconductor

September 30, 2012

Robert Oshana, Freescale SemiconductorSeptember 30, 2012

Performance measurement is another important area of SPE.  This includes planning measurement experiments to ensure that results are both representative and reproducible.  Software also needs to be instrumented to facilitate SPE data collection.  Finally, once the performance critical components of the software are identified, they are measured early and often to validate the models that have been built and also to verify earlier predictions.  See Figures 7 and 8 for an example of the outcome of this process for the Femto basestation project.



Click on image to enlarge.

Figure 7: Key parameters influencing performance scenarios based on cycle counts



Click on image to enlarge.

Figure 8: Output from a “Performance Calculator” used to identify and track key performance scenarios

Step 1: Determine where you need to be
Reject nonspecific requirements or demands such as "the system should be as fast as possible”. Instead, use quantitative terms such as “Packet throughput must be 600K packets per second for IP forwarding”.

Understand potential future use cases of the system and design the necessary scalability to handle them. Figure 9 shows an example of how to define these performance goals. To do this properly, the first step is to identify the system dimension. This is the context and establishes the “what”. Then the key attributes are identified. This identifies how good the system "shall be". The metrics are then identified that determine “how we’ll know”. These metrics should include a “should” value and a “must” value.

In the example in Figure 9, IP forwarding is the system dimension. For a networking application, IP forwarding is a key measurement focus for this application area. The key attribute is “fast” - the system is going to be measured based on how many packets can be forwarded through the system. The key metric is thousands of packets per second (KPPS). The system should be able to achieve 600 Kpps and must reach at least 550 Kpps to meet the minimum system requirements.



Figure 9: Defining quantitative performance goals

Step 2: Determine where you are now
Understand which system use cases are causing performance problems. Quantify these problems using available tools and measurements. Figure 10 shows a debug architecture for a Multicore SoC that can provide the visibility “hooks” into the device for performance analysis and tuning. Figure 11 shows a strategy for using embedded profiling and analysis tools to provide visibility into a SoC in order to collect the necessary information to quantify performance problems in an embedded system.

Perform the appropriate assessment of the system to determine if the software architecture can support performance objectives. Can the performance issues be solved with standard software tuning and optimization methods? This is important because it's not desirable to spend many months tuning the application only to determine later that the goals cannot be met using these tuning approaches and more fundamental changes are required. Ultimately, this phase needs to determine whether performance improvement requires re-design or if tuning is sufficient.



Figure 10: A debug architecture for a Multicore SoC that can provide the visibility “hooks” into the device for performance analysis and tuning


Click on image to enlarge.


Click on image to enlarge.

Figure 11: A tools strategy for using embedded profiling and analysis tools to provide visibility into a SoC in order to collect the necessary information to quantify performance problems in an embedded system.

Step 3: Decide if you can achieve the objectives
There are several categories of performance optimization, ranging from the simple to the more complex:

Low-cost/low ROI techniques - usually these techniques involve automatic optimization options. A common approach in embedded systems is the use of compiler options to enable more aggressive optimizations for the embedded software.
High-cost/high ROI techniques - re-design or re-factoring the embedded software architecture.
Intermediate cost/intermediate ROI techniques - this category includes optimizing algorithms and data structures (for example using a FFT instead of a DFT) as well as approaches like modifying software to use more efficient constructs.


Step 4: Develop a plan for achieving the objectives
The first step is to pareto rank the proposed solutions based on return on investment. There are various ways to estimate resource requirements, including modeling and benchmarking. Once the performance targets have been determined, the tuning phase becomes iterative until the targets have been met. Figure 12 shows an example of a process used in optimizing DSP embedded software. As this figure shows, there is a defined process for optimizing the application based on an iterative set of steps:

  • Understand key performance scenarios for the application
  • Set goals for key optimizations for performance, memory, and power
  • Select processor architecture to match the DSP application and performance requirements
  • Analyze key algorithms in the system and perform algorithmic transformation if necessary
  • Analyze compiler performance and output for key benchmarks
  • Write “out of box” code in a high level language (e.g.C)
  • Debug and achieve correctness and develop regression test
  • Profile application and pareto rank “hot spots”
  • Turn on low level optimizations with the compiler
  • Run test regression, profile application, and re-rank
  • Tune C/C++ code to map to the hardware architecture
  • Run test regression, profile application, and re-rank
  • Instrument code to get data as close as possible to the CPU using DMA and other techniques
  • Run test regression, profile application, and re-rank
  • Instrument code to provide links to compiler with intrinsics, pragmas, keywords
  • Run test regression, profile application, and re-rank
  • Turn on higher level of optimizations using compiler directives
  • Run test regression, profile application, and re-rank
  • Re-write key inner loops using assembly languages
  • Run test regression, profile application, and re-rank
  • If goals are not met, re-partition the application in hardware and software and start over again. At each phase, if the goals are met, then document and save code build settings and compiler switch settings


Click on image to enlarge.

Figure 12: A Process for Managing the Performance of an embedded DSP application

The first step is to gather data that can be used to support the analysis. This data includes, but is not limited to, time and cost to complete the performance analysis, software changes required, hardware costs if necessary, and software build and distribution costs.

The next step is to gather data on the effect of the improvements which include things like hardware upgrades that can be deferred, staff cost savings, etc
Performance Engineering can be applied to each phase of the embedded software development process. For example, the Rational Unified Process (RUP) has four key phases: Inception, Elaboration, Construction, and Transition (Figure 13).

RUP is an iterative software development process framework created by the Rational Software Corporation (now IBM). RUP is an adaptable process framework instead of a single concrete prescriptive process. Its intended to be tailored by software development teams that will select the elements of the process.

Figure 13: Rational Unified Process


< Previous
Page 1 of 2
Next >

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER