Three techniques for measuring application performance improvement -

Three techniques for measuring application performance improvement

In a previous article, I wrote about how developers can speed up their application code by executing time sensitive functions in RAM rather than from flash. You might be wondering if you were to make such an adjustment, what would the performance change be? The answer will vary based on the microcontroller’s fabrication technology but there are three techniques that developers can use to measure their applications or a functions performance:

  • Toggle an I/O pin
  • Setup a timer
  • Use the ITM

Let’s examine each of these techniques in detail.

Technique #1 – Toggle an I/O pin

The first and oldest trick in the book is to use a spare I/O pin and toggle it before and after a function is executed. We would want to do this measurement twice. First, while the function is still being executed from flash. Second, once we have moved the function to execute from RAM or when we have made whatever optimization it is that we are interested in. The code to do this is extremely simple and could be done by directly manipulating the bit or through a hardware abstraction layer as shown in Listing 1 and Listing 2. (Note the assumption for this code is that the port bit was initialized as output and set high).

PORTA &=~0x1;
PORTA |= 0x1; 

Listing 1 – Directly accessing the port register to toggle the pin low while the function is executing.

Dio_ChannelWrite(PORTA_0, LOW);
Dio_ChannelWrite(PORTA_0, HIGH);

Listing 2 – Indirectly accessing the port register to toggle the pin low while the function is executing through a hardware abstraction layer.

Technique #2 – Setup a Timer

A second technique that can be used to measure time is to setup a hardware timer. There are two ways that the hardware timer could be used. First, it could be used as a single shot timer where the timer is started right before calling the function. Second, the timer could be set to run constantly and be read before and after the function call. In this case, a developer would have to add extra code to calculate the difference between start and stop values in the timer register. An important tip to note is that you need to make sure that the timer ticks at a high enough resolution to capture the difference. For example, a timer tick of 1 millisecond might be too large. A step size of 10 microseconds would probably be a good starting point. Listing 3 and Listing 4 show some pseudocode on how the timer might be used to measure the time differences.

Timer_Start(TIMER_1);  // Start clears the timer register count
TimerCount = Timer_Read(TIMER_1);

Listing 3 – Example using driver API’s to start and stop a timer to measure directly how long the function took to execute.

CountStart = Timer_Read(TIMER_1);
CountStop = Timer_Read(TIMER_1);
ElapsedTime = (CountStop – CountStart) * TimerTickUnit;

Listing 4 – Example using a running timer to measure how long a function took to execute. Care must be taken to make sure the timer reads are atomic.

Technique #3 – Use the ITM

A third technique that can be used but is dependent upon the microcontroller architecture and hardware that is available is to use the instruction trace microcell (ITM). The ITM is typically available on Arm Cortex-M processors and is designed to allow developers to quickly pass trace information to the debugger without a lot of software overhead: the hardware does the heavy lifting. The software is really simple. First, a developer needs to make sure that they include the core header file for their microcontroller. For example, if I was working on a Cortex-M4, I would include core_cm4.h. The header file includes an important function for accessing the ITM called ITM_SendChar. We can use ITM_SendChar to send a character through the ITM before and after the function executed as shown in Listing 5.


Listing 5 – The Arm function ITM_SendChar can be used to send a data byte over the ITM before and after the function executes to get timing information about the function.

Each ITM packet contains more than just the character but also the packets cycle count. The difference between the cycle count for before and after the function can be used to get how many CPU cycles have elapsed. This can be seen in Figure 1 where ITM Port 1 is used to show the start of the function of interest and ITM Port 2 is used to show the end of the function of interest. In this case we can see a difference of 16 cycles (My test function was trivial in this example).

Figure 1 – Example screenshot showing the ITM being used to monitor the cycle count between events. In this screenshot, ITM Port 1 is being used to show the start of the function and then ITM Port 2 is being used to show the end of the function.


In this post, we have examined three techniques that developers can use to measure performance increases in their software. These techniques all work whether you are writing bare-metal or RTOS based applications. Each technique does require that a developer instrument their software so keep in mind that there may be additional overhead added to the measurements. Selecting a single technique though will provide an apple to apples comparison.

What additional techniques can you think of that can help developers measure their applications execution?

Jacob Beningo is an embedded software consultant, advisor and educator who currently works with clients in more than a dozen countries to dramatically transform their software, systems and processes. Feel free to contact him at, at his website, and sign-up for his monthly Embedded Bytes Newsletter.

1 thought on “Three techniques for measuring application performance improvement

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.