Introduction to Interrupt Debugging

Stuart Ball

May 31, 2002

Stuart BallMay 31, 2002

Introduction to Interrupt Debugging
Interrupt-related problems are among the hardest to debug. Here's a primer on some common pitfalls to avoid.

Interrupts are, in many cases, the key to real-time embedded systems. There's often no other way to make sure that a particular piece of code executes in a timely manner. Unfortunately, interrupts can increase a system's complexity and make overall operation less predictable.

An interrupt signals an event to the microprocessor. It could indicate that a particular amount of time has elapsed, that a user has pressed a button, or that a motor has moved a certain distance.

When an interrupt occurs, the microprocessor hardware saves the return address on the stack and transfers control to the interrupt service routine (ISR).[1]

The ISR saves the CPU context (unless the hardware does so automatically) and any registers that it will use. The context includes the contents of special registers, such as CPU status registers, and any other information needed to return the CPU to the state it was in just before the interrupt occurred.

After saving the context, the ISR does whatever the interrupt has prompted it to do, restores the CPU context, and returns to normal processing. The microprocessor resumes executing where it left off before the interrupt.

Because software cannot predict when interrupts will occur and ISRs briefly pause execution of the mainline software, we must always remember that an interrupt can occur at any time, between any pair of instructions. In my experience, most of the really difficult interrupt problems occur when that reality interacts with the rest of the software. In this article, we will take a look at two of the more common ones. We'll also discuss ways to avoid them.

Race conditions

A race condition is probably the most common interrupt-related problem. Take a look at Figure 1 along with the two pseudo-code fragments below:


Figure 1: A race condition

Mainline code:

1. Read variable X into register
2. Decrement register contents
3. Store result back at variable X

ISR code:

A. Read variable X into register
B. Increment register contents
C. Store result back at variable X

Let's say that the shared variable X is tracking the number of bytes in a buffer. The ISR puts a byte into the buffer and increments X. The mainline code reads a byte from the buffer and decrements X. Say that X starts out with a value of 4. The ISR puts a byte into the buffer and increments it to 5. The mainline code then reads a byte and decrements the count back to 4.

But if an interrupt occurs between lines 1 and 3 in the mainline code, the value of X will be corrupted. First, the mainline code reads X, which is 4, into a register. Then the ISR occurs, also reads 4, and increments X to 5. After the ISR completes, the mainline code finishes, storing the improper value 3 in X. This sequence is illustrated in Figure 1.

Any shared resource can be involved in a race condition. The issue also arises with shared hardware reigsters. It even applies to shared subroutines and functions, unless they are reentrant.[2]

This problem has several solutions. Some processors have atomic read-modify-write instructions that can read the memory location, modify the value, and write the new value back to memory without interruption. If you are using a high-level language such as C, it may be difficult to force the compiler to generate these special instructions. Some assembly may be required.

A second way to prevent race conditions is to disable interrupts around the read/decrement/write sequence in the mainline code, as illustrated below:

Protected mainline code:

0. Save interrupt state and disable
1. Read variable X into register
2. Decrement register contents
3. Store result back at variable X
4. Restore prior interrupt state

By far, the best solution is to avoid sharing variables and hardware registers between ISR and mainline code. In the example of the counter, this could be accomplished by using two counters. One counter is incremented by the mainline code, and the other counter ISR code. The number of bytes in the buffer is the difference between the two counts. Ideally, variables that are only written by the ISR code are only read by the mainline code, and vice versa.

Hardware complications

Some peripherals have more internal registers than externally addressable locations. Registers in such devices are manipulated by first writing a value to an address register, and then reading or writing data at a different address to access the selected register's contents.

The sequence to access a register in these devices is something like this:

  • Write identifier for desired internal register to "address register."
  • Read or write the "data register" to access the selected internal register.

A problem occurs if an interrupt fires between the two operations and the ISR also must manipulate that peripheral device's registers. The mainline code will write the address register to select whatever data register it needs to access. Then the ISR gets control, writes a different value to the address register, and accesses some other register. When the ISR returns and the mainline code completes its access, the address register has changed so the mainline code reads (or writes) the wrong register.

Devices that have this characteristic are often high-integration parts with a number of functions, and there may be no way to avoid having both ISR and mainline code access the device.

One way to prevent such a race condition is to read the contents of the peripheral address register and save it as part of the context in the ISR. If the address register is write-only, the only viable solution is to bracket the mainline access with an interrupt disable/enable pair.

Stack overflow

Another potential problem with interrupts is stack overflow. Since the return address and any additional context information is always added to the stack, an interrupt uses up stack space. If you add more information to the stack than it can hold, you get stack overflow.

When the stack overflows, one of three things may happen: the new (overflow) data may overwrite another memory area; the stack pointer may wrap around causing another part of the stack to be overwritten; or, in a system with hardware memory management, an exception may occur.

Stack overflow is more likely on microcontrollers and other systems with limited memory. A larger stack is the best way to prevent an overflow. Some microcontrollers have a fixed, hardware stack that requires careful attention from the programmer to prevent overflow. In some designs, you may even have to save some information in fixed memory locations, rather than on the stack, to prevent overflow.

The addition of interrupts to a system opens the window to dangers such as race conditions and stack overflow. Careful attention to design in these areas can save enormous amounts of debug time when you perform integration.

Stuart Ball is an electrical engineer with twenty years of experience in the area of embedded systems. He is the author of three books on the subject, all published by Butterworth-Heinemann. He holds a BSEE degree from the University of Missouri-Columbia. E-mail him at stuart@stuartball.com.

1. Massey, Russel. "Understanding Interrupts," Embedded Systems Programming, June 2001, p. 95.
Back

2. Ganssle, Jack. "Reentrancy," Embedded Systems Programming, April 2001, p.183.
Back

Return to June 2002 Table of Contents

Loading comments...

Parts Search Datasheets.com

KNOWLEDGE CENTER