How to calculate CPU utilizationIs your chip fast enough? Is it too fast? Systems engineers might be paying for more chip than they need, or they may be dangerously close to over-taxing their current processor. Take the guesswork out of measuring processor utilization levels.
Many theories and guidelines dictate how burdened a processor should be at its most loaded state but which guideline is best for you? This article presents several ways to discern how much CPU throughput an embedded application is really consuming. You can use this information to verify the system software design versus a maximum processor load.
Sizing a project
Selecting a processor is one of the most critical decisions you make when designing an embedded system. Your selection is based on the features required to satisfy the control functionality of the final product and the raw computing power needed to fulfill those system requirements. Computing power can be formally specified with benchmarks such as MIPS, FLOPS, Whetstones, Dhrystones, EEMBC marks, and locally contrived benchmarks. Many times, however, you won't know precisely how much raw throughput is needed when you select the processor. Instead you'll have only experience and experiential data to work with (from the microprocessor vendor or the systems engineer).
In any case, once the system development has progressed, it's in the team's best interest to examine the CPU utilization so you can make changes if the system is likely to run out of capacity. If a system is undersized, several options are available: upgrade the processor (if possible), reduce available functionality, or optimize, optimize, optimize.
This article doesn't focus on any of those solutions but illustrates some tools and techniques I've used to track actual CPU utilization. You can use these methods to determine how close to the "edge" a specific project is performing.
EQUATIONs 1 through 4
Defining CPU utilization
For our purposes, I define CPU utilization, U, as the amount of time not in the idle task, as shown in Equation 1.
The idle task is the task with the absolute lowest priority in a multitasking system. This task is also sometimes called the background task or background loop, shown in Listing 1. This logic traditionally has a while(1) type of loop. In other words, an infinite loop spins the CPU waiting for an indication that critical work needs to be done.
Listing 1: Simple example of a background loop
int main( void )
while(1) /* endless loop - spin in the background */
... do other non-time critical logic here.
This depiction is actually an oversimplification, as some "real" work is often done in the background task. However, the logic coded for execution during the idle task must have no hard real-time requirements because there's no guarantee when this logic will complete. In fact, one technique you can use in an overloaded system is to move some of the logic with less strict timing requirements out of the hard real-time tasks and into the idle task.
Using a logic state analyzer
Of the several ways to measure the time spent in the background task, some techniques don't require any additional code. Let's look at three techniques.
The first is an external technique and requires a logic state analyzer (LSA). The LSA watches the address and data buses and captures data, which you can interpret. In this test, you should configure the LSA to trigger on an instruction fetch from a specific address and measure the time between each occurrence of an observation of this specific address.
The address to watch for could be any address within the while(1) loop from Listing 1. The task of identifying an appropriate address is tricky but not inordinately difficult. You can use the map file output by the linker to get close to a good address. Peruse the map file for the address of the main function, and then set up the LSA to look for the occurrence of any address within a limited range beyond the entry to main. This range is justified because, unless there's a large amount of logic between the entry to main and the start of the while(1) loop, the beginning of the loop should be easy to spot with a little iteration and some intelligent tweaking of the address range to inspect.
If your LSA can correlate the disassembled machine code back to C source, this step is even more straightforward because you only have to capture the addresses within the range known to hold the main function (again, see the map file output from the linker) and then watch for the while(1) instruction. If the while(1) loop is moved to its own function, perhaps something like Background(), then the location is much easier to find via the linker map file.
If the previous approach isn't appealing, you have other options. By inspecting Listing 1, you'll notice that the CheckCRC function is called every time through the background loop. If you could ensure that this is the only place where CheckCRC is called, you could use the entry to this function as the marker for taking time measurements.
Finally, you could set a dummy variable to a value every time through the background loop. The LSA could trigger on writing to this "special" variable as shown in Listing 2. Of course, I'm supposed to be showing you how using the LSA means you don't have to modify code. However, the code change in Listing 2 is so minor that it should have a negligible effect on the system.
Listing 2: Background loop with an "observation" variable
extern INT8U ping;
while(1) /* endless loop - spin in the background */
ping = 42; /* look for any write to ping)
.. do other non-time critical logic here.
Regardless of the method you use to trigger the LSA, the next step is to collect time measured from instance to instance. Obviously, the LSA must be able to time stamp each datum collected. Some of the more sophisticated modern logic analysis tools also have the ability to carry out some software performance analysis on the data collected. One such function that could help would be one that mathematically averages the instance-to-instance timing variation. Even more helpful is a histogram distribution of the variation since this shows the extent to which the background-loop execution time varies.
If the LSA doesn't perform any kind of data analysis, you have to export the data and manipulate it using more labor-intensive tools, such as a spreadsheet. The spreadsheet is a good alternative to an LSA-based performance analysis tool as most spreadsheet applications have many statistical tools built in.
To accurately measure CPU utilization, the measurement of the average time to execute the background task must also be as accurate as possible. To get an accurate measurement of the background task using the LSA method, you must ensure that the background task gets interrupted as little as possible (no interruptions at all is ideal, of course). Essentially two classes of interrupts can disrupt the background loop: event-based triggers and time-based triggers. Event-based triggers are usually instigated by devices, modules, and signals external to the microprocessor. When measuring the average background time, you should take all possible steps to remove the chance that these items can cause an interrupt that would artificially elongate the time attributed to the background task.
It may be possible to disable the timing interrupt using configuration options. If it's possible, the background measurement should be extremely accurate and the load test can proceed. However, if it's impossible to disable the time-based interrupts, you'll need to conduct a statistical analysis of the timing data. Specifically, the histogram analysis of the time variation can be used to help the tester discern which data represent the measured background period that has executed uninterrupted and those that have been artificially extended through context switching.
Figure 1: Sample Histogram
Figure 1 shows a histogram of an example data set. This data set contains a time variation of the measured idle-task period. Analysis of idle-task"period histogram data requires that you know how background loops become interrupted. This knowledge can help you isolate which histogram data to discard and which to keep. Looking at the sample histogram, you might estimate that any data above the threshold of 280μs represents instances where the background task was interrupted. Using this threshold, you would discard all data above 280μs for the purpose of calculating an average idle-task period. For the sake of this example, let's assume that the average of the histogram data below the threshold of 280μs is 180μs. Therefore, in all of the subsequent calculations, we'll use a value of 180μs to represent the average execution time for one cycle through the background loop in an "unloaded" system.
Once you know the average background-task execution time, you can measure the CPU utilization while the system is under various states of loading. Obviously there's no way (yet) to measure CPU utilization directly. You'll have to derive the CPU utilization from measured changes in the period of the background loop. You should measure the average background-loop period under various system loads and graph the CPU utilization.
For example, if you're measuring the CPU utilization of a engine management system under different systems loads, you might plot engine speed (revolutions per minute or RPM) versus CPU utilization. Assume the average background loop is measured given the data in Table 1. Note that the background loop should only be collected after the system has been allowed to stabilize at each new load point.
Table 1: System load (RPM) vs. average background loop period (T)
Now you've collected all the information you'll need to calculate CPU utilization under specific system loading. Recall from Equation 1 that the CPU utilization is defined as the time not spent executing the idle task. The amount of time spent executing the idle task can be represented as a ratio of the period of the idle task in an unloaded CPU to the period of the idle task under some known load, as shown in Equations 1 and 2.
Table 2 shows the results of applying Equations 1 and 2 to the data in Table 1. Figure 2 shows the salient data in graphical form. Of course you'll want to reduce the amount of manual work to be done in this process. With a little up-front work instrumenting the code, you can significantly reduce the labor necessary to derive CPU utilization.
Figure 2: CPU utilization vs. system load (RPM)
Table 2: System load data and calculated utilization
|RPM||T (μs)||% Idle||%CPU|