DebugTip: Circular History Buffer - Embedded.com

DebugTip: Circular History Buffer

It is often useful to examine the sequence of events leading up to an observed software anomaly. Here's a handy way to capture events of interest to RAM.

In most software projects, the debugging phase lasts much longer than anyone ever expects. Pressures to keep on schedule demand that we find and fix defects quickly. Initially, software engineers use source-code debuggers to go through the code to eliminate the first set of bugs. However, source-code debuggers become less useful after the initial bugs have been found; the bugs that remain are less apparent.

During the second debugging phase, the software may start up and work for a while, but fail when running regression tests. This happens for any number of reasons. For example, linked lists could become corrupt; assert() s placed in the code could suddenly evaluate to false; I/O transactions may not be completed; or software state machines could stall. To find and fix many of these bugs, engineers need to understand what happened in the software just prior to the failure. At this point, some engineers begin placing printf() s in the code in an attempt to log software events prior to an error condition. The main problem with the printf() debugging technique is that it consumes a lot of CPU cycles. This affects the system timing and may even make the bug non-reproducible. Other problems with printf() debugging include the following:

  • In many systems, one cannot call printf() from an interrupt service routine (ISR). This is because all hardware interrupts are typically disabled while in an ISR, and the printf() cannot proceed unless interrupts are enabled for the serial device used within the printf() .
  • printf() can tie up a serial port resource, making it unavailable for other activities.
  • When debugging the software initialization process or boot-up, printf() may not be available yet.

An alternative debugging technique is to use a circular history buffer to log critical software events prior to an error condition. This technique has minimal impact on system timing, and requires only one resource: a configurable amount of RAM. It can be applied to any software project, from simple systems with a single main loop, to systems with an operating system and multiple threads. It can even be used in a remote location in the field.

Events

The circular history buffer debugging technique uses a fixed amount of memory to hold the most recent software events leading up to the failure. Older software events are overwritten. Information logged with each event can be tailored to specific needs, but typically consists of a time stamp, a pointer to a character string, and an optional 32-bit value. A typedef for a history buffer event is shown below:


typedef struct {
unsigned int timeStamp;
char *message;
unsigned int associatedData;
} HistItem;

The timeStamp field associates an event with a specific point in time. Most systems have either a hardware counter or a software counter incremented by a periodic interrupt. When an event is logged, the value of the counter is read and placed into the HistItem.timeStamp field. Here, timeStamp is shown as an unsigned int, but it can be whatever is appropriate for the particular system. An added benefit to a timeStamp is that it can provide insight into system timing, such as how long it takes to execute a particular ISR or function call.

The message field is a pointer to a null-terminated string, which can provide information about location. For example, it could be “function foo(),top ” or “isr XYZ “. When an event is logged, only the pointer to the string is passed, and only the pointer becomes part of the event. Not copying characters or using a more general printf() format field reduces the impact on system timing. Later, when the history buffer is printed, the message pointer is de-referenced and the complete character string is printed out.

The 32-bit associatedData field may be used to associate the value of a local or global variable with the event. For example, interrupt service routines often make use of a local variable to contain the value read from an interrupt event register. Logging this value into the history buffer memory can provide insight into the timing of hardware interrupts. Another example comes from software state machines, frequently implemented with a global variable to hold the current state of the state machine. If the software state machine is operating incorrectly, logging the state of the state machine when it is modified can provide insight into the problem. As the event is logged, the associatedData value is copied into history buffer memory.

For systems with abundant memory, HistItem can include more fields, such as a thread identification field, a filename (__FILE__), a line number (__LINE__) , and so on. Tuning the HistItem data structure to log the most relevant information is easy.

Circular buffer

The circular history buffer is implemented as a simple array of HistItem s. A global integer variable tracks the start of the history buffer, while another global integer variable tracks the next slot to be used when logging an event.

Sample C code for the circular history buffer is available online at http://www.eetimes.com/design/embedded/source-code/4200245/purcell07-zip. The history.h file provides the function prototypes associated with using a circular history buffer. Insert a #include at the beginning of any file that will be modified to call history buffer functions. To compile history buffer code, use a -DHISTORY compile flag. If that flag is not defined, all of the history buffer code and hist_add() calls can remain present-they are simply #defined to be nothing or #ifdefed out.

Calling hist_init() () initializes the history buffer. The subsystemsEnabled parameter may be used to enable one or more subsystems. For example, when debugging only software module XYZ, enable the history buffer for the XYZ subsystem only. This will screen out all events that are not applicable, so you can log more of the events you are interested in.

Use hist_addx() () to log an event if there is a variable value to be associated with the event. The subsystem parameter indicates the software module from which the event is coming and the msg parameter specifies a pointer to a character string message. The data parameter is copied into the HistItem associatedData field. The history.c file shows that this function first disables interrupts before modifying key variables and data structures so that logging an event is atomic and is multi-thread safe. Another way to make logging atomic is to lock a mutex. However, mutex and semaphore operations typically are not allowed if executing in an interrupt service routine, and it is frequently advantageous to be able to log events from ISRs. Next, hist_addx() checks to see if the subsystem is currently enabled for logging. If so, it places the current time, the msg pointer, and data into histbuf[histidx] memory. Global variable histidx is then incremented, and if the history buffer has filled, histidx is reset to 0 and variable histstart is incremented. The next time an event is logged, it will overwrite the original event placed in histbuf[0] .

Use hist_add() to log an event if there is no variable value to be associated with the event. The subsystem parameter and the msg parameter are as explained previously. The hist_add() function simply calls hist_addx() with a data value = NOVAL .

Macros HIST_XYZ() and HIST_XYZx() , as shown in history.h , may be used for calling hist_add() and hist_addx() . The macros eliminate some typing because the subsystem field is made part of the macro.Use the hist_print() function to print the history buffer if a problem occurs. The doReturn parameter, if set true, causes the print routine to spin after printing is completed. This allows a source code debugger to be attached to examine the system for further clues about the error. Mechanisms for getting hist_print() to execute include:

  • A call to hist_print() may be placed in the code where an error condition is detected.
  • An assert() macro may be defined to evaluate an expression, and, if the evaluation fails, call hist_print() .
  • A debugger may be attached and then used to change the CPU program counter so that hist_print() will execute.
  • In some development environments, a new task may be spawned to execute hist_print() .
  • Special commands may be defined and sent over the I/O ports, causing the firmware to call hist_print() .

An example of output produced by hist_print() follows:


3012:read
3015:read,waiting for data
3367:isr,RxReq
3378:isr,src:0x00bec14c
3380:isr,getRxParms:0x001b4588
3393:isr,dest:0x001b4590
3401:isr,signal semaphore:
0x0010a020
3543:TxResp,context:0xf00429a0
3547:asyTx:desc:0x00bf5c60
3572:isr,TxRespDone
3607:read,done
...

Experiences

Recently, the circular history buffer debugging technique helped solve an I/O problem between a lab prototype embedded system and a PC. After running a particular regression test for about a half-hour, the test failed when a command transaction was sent from the PC to the embedded system, and the embedded system firmware failed to send a response back to the PC.

The problem was solved by recompiling the code with the -DHISTORY compile flag and then downloading new firmware to locate the problem. The code had been previously instrumented with hist_add() calls in strategic places. After the regression test started and the problem occurred, a command was sent over an unused I/O port causing the firmware to call hist_print() . Upon examining the history, it became clear that there was a hardware problem. The firmware was subsequently modified to work around the hardware problem, allowing the regression test to run successfully.

Another example involved a remote customer. The problem could not be duplicated locally, and it was impractical to duplicate the remote customer's environment exactly. The product had firmware download capability, so the code was recompiled with history buffer capability turned on. The code was then sent to the remote customer and downloaded. Over time, more hist_add() calls were added until the root cause of the problem became evident. Once the problem was identified, the firmware was fixed and the incident with the remote customer was resolved.

The circular history buffer debugging technique has been shown to work well. Software engineers involved in creating code for embedded systems should consider this an additional tool for debugging, and people involved in specifying or purchasing software would be well-served by asking for this capability.

The code referenced in this article can be found online at http://www.eetimes.com/design/embedded/source-code/4200245/purcell07-zip.

Andy Purcell graduated with a BSEE from the University of Kansas and has been with Hewlett-Packard/Agilent for 22 years. He first did integrated circuit and analog design, then transitioned to software design 12 years ago. Andy has been involved in software designs for spectrum analyzers, storage devices, and most recently, 1394 I/O software. You can e-mail him at .

Return to July 2001 Table of Contents

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.