Where testing fails, part 2 - Embedded.com

Where testing fails, part 2

Testing, alone, cannot find all potential failure points for real-time embedded software, particularly in multithreading environments. In part 2 of Sean Beatty's article on errors that escape detection, the focus is timing and reentrancy problems.

Most software development projects rely on some combination of code inspection, structural testing, and functional testing to identify software defects. While these conventional techniques are invaluable and uncover most software problems, they can't detect many common types of errors in today's complex systems. In part 1 of “Where Testing Fails,” I discussed stack overflows, race conditions, and deadlocks. This time, I'll show you how to avoid timing and reentrancy problems that create elusive errors.

Timing problems
Numerous types of problems can arise when the software misses its timing constraints. Before proceeding further, it's important to distinguish between hard timing requirements and soft timing requirements. A hard timing requirement denotes that if a constraint is missed, the program is in error; a late result is just as bad as a wrong result. A soft timing requirement indicates a constraint that the program should obtain on average, or one in which the value of the result degrades with time; occasionally missing a deadline is not considered an error. The following discussion is most relevant to hard real-time errors. Soft timing errors tend to be less critical.

The following are all timing-related problems:

  • Priority inversion
  • Schedulability and processor utilization
  • Jitter in outputs and release of periodic threads
  • Interrupt and event response times
  • Watchdog resets

Such problems generally go undetected by testing, since they only occur under specific timing conditions. However, you can uncover these problems by a thorough analysis of the software. A good example is priority inversion. Priority inversion occurs when a lower-priority thread keeps a higher-priority thread from running for an unbounded amount of time. It's most easily avoided by augmenting the operating system with code that adjusts thread priorities as they acquire and release resources. For more background on the subject, see the article “Introduction to Priority Inversion” by David Kalinsky and Michael Barr.

Figure 1: Resource allocation spreadsheet

Since testing is unlikely to uncover a latent priority inheritance problem, it's important to understand how threads share resources. If a system has threads of three or more priorities, and a resource is shared by both a high- and low-priority thread, priority inversion must be prevented by proper design. A resource allocation spreadsheet like that shown in Figure 1 can make it easier to identify threads that are susceptible to priority inversion.

Processor utilization
Another timing problem occurs when the processor becomes so overloaded executing threads and interrupts, that it misses deadlines. Usually, this problem only appears under worst-case conditions or infrequent combinations of events. Performing a schedulability analysis guarantees that the processor will be able to meet all the demands for processing placed on it.

Table 1: Cyclic executive

Task Schedule
in ms
in ms
per 200ms
worst 10ms
Timer Tick 1 0.1 20 1.0
Minor Loop 10 5.0 100 5.0
Keypad Process 40 2.0 10 2.0
Update LCD 100 1.0 2 1.0
Async Serial Comm 10 0.1 2 0.1
Async Interrupt 20 0.2 2 .02
Totals 136 9.3

The analysis varies greatly depending on the type of real-time software architecture. A cyclic executive is one of the simplest designs to analyze. The code is organized with a loop (often called “the minor loop”) that's executed at a base periodic rate. Within the minor loop, other groups of functions (threads) are called at integral multiples of the base period. In the example schedule shown in Table 1, the processor time required for all the threads and interrupts is listed. The major loop for this system is 200ms, which is the least common multiple of the periodic schedules of all the threads including the 10ms minor loop. Processor utilization is calculated by summing the processor time used by each thread and interrupt over one full major loop. As long as the calculated utilization is less than 100%, the processor can handle the load. In this example, the utilization is 136ms/200ms = 68%, which is well within limits.

The schedulability analysis for a cyclic executive depends on the deadline for each thread. In most designs, all threads must finish before the next minor loop begins. Periodic interrupts, such as Timer Tick, are treated like any other periodic thread. The asynchronous interrupts, Async Serial Comm and Async Interrupt, are given a “schedule” equal to their shortest interarrival rate. The worst-case situation occurs when all the threads and interrupts occur within the same minor loop. All of them must then complete within 10ms. In our example, they will all complete within 9.3ms, leaving 0.7ms to spare.

Schedulability bound
Designs that use priority-based preemptive scheduling algorithms are more complex to analyze. Even if the calculated processor utilization is less than 100%, it may not be possible to schedule the threads because of the specific mix of threads and deadlines. For example, a multithreading system with threads assigned priority optimally (according to RMA) can generally only use n (21/n -1)x100% of the processor, where n is the number of threads. For a system of three threads with periods of arbitrary length, only about 78% of the total processor utilization can be considered to be available; of course, three specific threads with well-matched deadlines and CPU usage may be able to use more than that. When threads can be blocked trying to access shared resources, or relinquish the processor voluntarily, the schedulability algorithms grow more complicated.

It's beyond the scope of this article to discuss all the various scheduling considerations, which are well covered in texts on real-time design. However, it's important to note that all the algorithms depend on knowing the worst-case execution time for each thread, in order to determine if the set of threads can be scheduled. For an overview of RMA, see the Beginner's Corner article “Introduction to Rate Monotonic Scheduling.”

Worst-case execution time
Although many RTOS development environments and third party tools will report the amount of time a thread or function has run to some resolution, they do not report the worst-case execution time. Simply measuring how much processor time a thread has consumed on a given run (or runs) is not adequate, especially if compensation is not made for the effects of preemption and interrupts.

For safety- or mission-critical applications, each thread must be carefully analyzed to determine that path through the thread that will result in the worst-case execution time. Scrutinize the code to determine which combination of inputs will then yield that worst-case path. For anything but the simplest of threads, determining this combination of inputs may not be feasible.

Figure 2: Portion of a stack analysis spreadsheet

A more practical approach identifies the functions called by each thread, measures the worst-case execution time of each function, and builds a value for the execution time of the thread based on the functions it calls. A spreadsheet similar to the stack analysis of Figure 2 can be used for this job. Instead of recording the stack used by each routine, record the worst-case execution time used by just that routine, excluding the time used by any functions it calls. The spreadsheet is programmed to recursively walk through the call-tree to calculate the execution time required for every listed routine, adding in the time for every function beneath it in the tree. This provides a usable worst-case execution time needed by the scheduling algorithms.

This spreadsheet approach assumes that in the worst-case path, a given function will call every function referenced in its body. This may not be the case for all structures. For example:

if (flag == TRUE) then

To obtain a more accurate result, examine the spreadsheet to see which function consumes more processor time, function_A() or function_B() . Then modify the spreadsheet, deleting the function that takes the shorter amount of time from the list of called routines (for that calling function only). The inaccuracy can sometimes be ignored, understanding that the worst-case time reported will be longer than that ever encountered. If the system of threads is schedulable for the longer time, it will be schedulable for the shorter time.

Once the worst-case execution time path has been identified for a function, the software is run with its inputs set to the values that force it down that worst-case path, and the execution time is measured. An in-circuit emulator (ICE) is useful in making these measurements.

Some development environments may have other tools that could also be used, like a high-resolution clock that can be read before and after a function's execution. It's important to disable interrupts and preemption while making the measurement, to prevent them from being erroneously included in the execution time for the function. Any calls the function makes must also be excluded from the time measurement. If using an ICE, this is sometimes easily done in the assembly language window by either replacing the calls with nops, or replacing them with a call to a dummy routine that just executes a return.

As in the stack analysis, you must examine the assembly language source to identify library function calls, which can consume significant amounts of processor time. Many compilers targeted for real-time embedded systems specify the execution time of these functions, so no measurement is necessary. You can enter their values directly into the spreadsheet.

When the compiler vendor doesn't provide the execution time, you must examine the source code of the functions, identify the worst-case path, and measure the function's execution time, just like any other function in the software. In some cases, a run-time or C library function will be called with constant, not variable, data. In these situations, their execution time can be included in the measurement of the calling function, since the execution time can be assumed to remain constant. If an RTOS is used, the vendor should supply the execution time and preemptibility of all kernel calls.

Be aware: some kernel calls can be preempted and will restart (not resume) when they regain the processor, which increases their worst-case execution time.

If all this seems like a lot of work—it is. There are some approaches, however, that can minimize the effort. The first is to accurately specify all timing behavior. Soft real-time constraints may not demand a high level of accuracy. They can often be sufficiently verified through good load or stress testing. Minimizing the number of hard real-time requirements saves a lot of effort. Second, partition the system into threads that have hard real-time requirements and those that don't. Only those threads with the hard requirements may need to be analyzed in detail.

Third, incorporate the timing measurements into the existing testing process, instead of making it a separate activity at the end of the development cycle. Since each function of the code must be analyzed to determine the worst-case execution path, it's convenient to do this when the function is white-box tested. During white-box testing, the function is already being examined to determine the expected outputs for a range of inputs and the inputs required to force certain paths through the function, among other things. It's often a simple matter to determine which path will lead to the maximum execution time while the function is being so thoroughly examined.

Many times, white-box testing is performed using an ICE, because it makes modifying inputs and monitoring outputs simple. In this testing environment, it's quick to add a run through the code down the worst-case execution path and measure the elapsed time.

Last of all, it's critical to remove all nondeterministic timing constructs from code that has hard real-time constraints. Failure to do so results in code that's difficult to analyze, at best, or impossible to analyze, at worst. Avoid these:

  • Waiting for acknowledgement from hardware
  • Recursive algorithms
  • Dynamic memory allocation
  • Access to some types of I/O devices (disks, for example)
  • Calling third-party library routines that don't specify a maximum execution time

Another timing problem that testing alone may not uncover is excessive jitter in the threads' outputs. Applications such as real-time communications, streaming media, and control systems often depend on outputs being generated at a fixed periodic rate. If an output has excessive jitter, the quality of the result is degraded. Some systems, such as aircraft controls, have hard real-time constraints on allowable jitter. Output jitter is affected by:

  • The variation in the execution time of the thread
  • Preemption by other threads
  • Interrupts

To ensure that the worst-case output jitter will be acceptable, the maximum time of all three of these contributing factors must be considered.

The variation in the execution time of the thread is simply the maximum execution time minus the minimum execution time. The minimum execution time can be measured in similar fashion as the maximum execution time, by analyzing the code to determine the path that leads to the shortest execution time and then measuring the time duration with an ICE.

The effect of preemption by other threads is dependent on the scheduling algorithm used. If all the threads can become ready at the same instant, the jitter will be at least the maximum execution time of any higher-priority threads. If protected resources are shared among the threads, additional resource-blocking delays will add to the jitter. The maximum execution time of the longest interrupt service routine (ISR) for each priority level of interrupts must also be added to the worst-case jitter. Adding all three of these components gives the maximum output jitter for a given thread.

Another way to reduce output jitter is to update all the outputs at the beginning of the periodic thread, instead of at the end, using the last calculated values. This eliminates the first component of jitter, but increases the latency between the sensor input used to calculate the new outputs and the actual updating of those outputs. Care must be taken, since this may reduce the effectiveness of a control system even more than the jitter does.

Jitter in the time a thread begins can also be a problem. Many digital control systems read their sensor inputs at a fixed periodic rate. The algorithms which process the inputs assume that there is a specific, fixed period of time between each new set of input data. Variations in the sample time from one period to the next will reduce the accuracy of the calculations, causing problems such as a decrease in the signal to noise ratio. Input jitter is typically the same as output jitter, without the contribution of the variation in execution time of the thread.

Interrupts and events
Interrupts are an important component of most computer-based systems. Their operation affects the timing of every thread in the system. That effect must be measured and taken into consideration in order to properly understand the system's timing behavior. The frequency of the interrupts and the execution times of their service routines are critical inputs to the schedulability analysis of the system.

Events that require a fast response are often allocated to interrupts. In these situations, the maximum response time for the event is the worst-case execution time of the ISR up to the point the event response is generated, plus the interrupt latency. The execution time of an ISR must be determined just like any other function.

The interrupt latency for a given interrupt is the maximum time from the point the interrupt is asserted to the time the processor begins executing the interrupt's ISR. Many systems have hard real-time constraints on the response to one or more interrupts. In these systems, testing and measurement alone generally will not detect the maximum interrupt latency, since it's not feasible to generate all the conditions at the exact moment necessary to produce the worst-case timing. So, you must calculate the latency, adding all the following:

  • The longest interrupt execution time of any other single interrupt at that interrupt priority level,
  • plus the longest interrupt execution time of every interrupt at any higher-priority levels
  • The longest time interrupts at that priority level are disabled
  • The longest instruction time (usually a “divide” instruction)
  • The time needed to switch context and start the ISR (save the stack pointer and processor state and jump to the starting address of the ISR)
  • If the processor is in a low-power mode, the time to resume operation (from HALT, SLEEP, and so on)
  • Any additional delays related to the specific processor, such as an interrupt request hold

Events are not always completely handled within an ISR. In some designs events are detected by an interrupt, which then sets a global event flag, sends a signal, or puts the event into a queue for later processing. Determining the event response time is more complicated in these scenarios, since the thread that reads the flag, receives the signal, or pulls the event from the queue has to be considered. The event response time in these situations is:

  • The interrupt latency
  • The worst-case execution time of the ISR
  • The worst-case response time of the system once the event has been signaled or queued, to the point the thread that responds to the event begins running (this depends greatly on the system architecture)
  • The worst-case execution time of the thread to the point that the event is handled

Other systems detect events by polling. For these designs, the response time is the polling interval plus the time between detection and response. Polled events tend not to have hard response times.

Watchdog timers
Many embedded systems employ a watchdog timer to reboot the system automatically in the event that some or all of the software hangs or wanders off into the woods. This watchdog timer is restarted (“kicked”) frequently, when the code is running properly. However, if the code gets stuck in an infinite loop, jumps to a random memory location and begins executing instructions, or fails in some other dramatic fashion, the code that kicks the watchdog timer will not be executed. The watchdog times out and (generally) resets the processor. In some systems, instead of a timer, a monitoring thread does the job of determining that all is well. There are many variations of watchdog timer designs, but they all rely on the watchdog being kicked under all correctly operating scenarios—thus keeping it from timing out unless the software is not operating properly.

Measuring the execution time between watchdog timer kicks is unlikely to indicate the longest actual time that could occur between restarts. As with other timing parameters, this worst-case time must be determined by analysis. Everything that consumes processor time between watchdog timer kicks must be considered: threads, interrupts, events, context switches, system calls. The worst-case execution time of all these contributors will affect how frequently the watchdog is actually kicked. You can use the data from all the timing analyses discussed earlier to determine the maximum time between any two consecutive watchdog timer kicks. It's this maximum time that's important—the watchdog cannot be set to expire any sooner, or it may reset the system when it shouldn't, while the code is operating properly. If this maximum time is not determined accurately, the watchdog timeout period may be set too aggressively.

Many functions must complete their execution and return before they're called again, in order to avoid problems. Other functions have been designed so that if they are preempted and called again before the previous invocation returns, everything works fine; these functions are reentrant. It's critical to understand which functions in a program must be reentrant and ensure that, indeed, they are. Any function that's called from threads or interrupts of differing priority levels, whether the function is a normal source code function or from the C or run-time library, must be designed for reentrancy. If the function is not, when it's preempted and reinvoked, this second invocation will corrupt the processing of the initial invocation.

A function is reentrant if doesn't modify anything other than it's parameters and nonstatic local variables, and it doesn't call any nonreentrant functions. In addition, avoid reads that trigger an action in a peripheral register, such as clearing a flag or changing a state.

In systems that employ a cyclic executive architecture, all the code other than the ISRs is considered one “thread” for the purposes of analyzing reentrancy. Since none of the code operates concurrently, every function will always return before it's called again. Only those functions that are called by an ISR and the main thread need to be reentrant.

For more complex systems that contain multiple threads of different priorities, you must develop a call tree for each thread (and ISR, if needed). The call trees are searched for functions called by more than one thread. For large call trees, it's convenient to capture this information in a spreadsheet similar to Figure 2, with additional information identifying the function forming each thread's root. The spreadsheet can be programmed to find the functions common to more than one thread. Or it can be dumped as a text file and a script written to search for the functions that must be reentrant. As with the Stack Analysis, it's important to capture the C-library and run-time library calls from the assembly language—some of them may not be reentrant. Any preemptable RTOS kernel calls that are invoked from more than one thread must also be reentrant.

Beyond testing
Testing alone is unlikely to find stack overflows, race conditions, deadlocks, many types of timing problems, and reentrancy clashes. These types of problems are difficult to find in white-box unit testing or inspection since they are caused by interactions with other routines outside the unit. They're difficult to find with black-box testing since the conditions required to produce the problem may be nearly impossible to generate, even if they are known. Most of the time, the system behaves as expected. It's only when certain paths through the software are followed or certain timing conditions are present that a problem is manifested. Since the conditions may be very transient, the error can be difficult to reproduce and correct.

The five classes of software defects discussed in this pair of articles can produce critical failures of the system. Especially in safety- and mission-critical applications, testing and inspection must be augmented with appropriate analysis in order to verify correct software performance under worst-case situations. By understanding the need for this analysis at the beginning of the development cycle, you can design the software to minimize the required analysis. By gathering the critical information throughout the development and testing phases, you can reduce the additional effort required to analyze the code. By using some simple spreadsheets and scripts, you can update and process the analysis quickly. Analyzing the code for these difficult to find, yet critical defects, will produce more reliable, higher-quality software. esp

Sean M. Beatty teaches, consults, develops, tests, and analyzes software. He holds a BSEE from the University of Wisconsin at Milwaukee and has 18 years of embedded systems experience. He can be reached at .


  1. Beatty, Sean, “Where Testing Fails,” Embedded Systems Programming, August 2003, p.36.
  2. Kalinsky, David and Michael Barr, “Introduction to Priority Inversion,” Embedded Systems Programming, April 2002, p. 55.
  3. Stewart, David and Michael Barr, “Introduction to Rate Monotonic Scheduling,” Embedded Systems Programming, March 2002, p. 79.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.