How to debug elusive software code problems without a debugger
Many of the software bugs in deeply embedded systems can be debugged using a hardware debugger, but there are instances when a problem maybe so elusive that it appears only following prolonged code execution and/or complex interactions with other nodes in the system. Depending on the system design, it may not always be practical to attach a hardware debugger, and as a result alternative debugging methods must be employed.
This article will discuss such problems and provide a software technique that captures call stack in real time and uses the stack dump from the embedded system at the point of failure. This article will also discuss a Python-based tool to match content of the stack to the disassembler output to recover full-function call stack and eventually find a point of failure.
For this article we will focus on debugging wireless sensor networks, but the same techniques can be applied to any system with a large number of distributed devices, or even a single device, that cannot be run under the debugger.
One of the major problems with debugging networks of devices is that behavior of the individual devices depends on the behavior of the surrounding nodes and the amount of traffic being exchanged. This makes it impossible to debug such systems on a low scale.
The method presented here is based on the analysis of the stack contents of the failed node and matching the contents to the disassembly of the application binary in order to recover the function call stack. A watchdog timer is used to detect an infinite loop condition and to trigger saving of the stack contents.
While the idea of this method is generic, different MCUs and compilers use slightly different ways of handling stacks, so tools shown here will have to be modified for a specific target system. For this article we will focus on Atmel’s AVR MCUs, in particular the ATmega128RFA1 and ATmega256RFR2, and a GCC compiler.
To effectively use the method proposed here, we need to understand how compilers use stacks to store return address from the called functions. Fundamentally there are two types of data that have to be temporarily stored in the stack – local variables of the called function and return address to the calling function. Some compilers have two separate stacks, which makes it easier to recover the call stack since it is stored in one sequential memory location.
GCC compiler uses the same stack to store both return addresses and local variables. This makes it harder to parse the stack without deep analysis of the function operation. Fortunately it is possible to recover most of the useful information using a brute-force method. Instead of trying to recover return addresses from known locations, we will go over every possible combination of bytes of the appropriate length on the stack and check if it can be a return address. This method will occasionally give false positives by interpreting data on the stack as a valid return address, but usually they are easy to spot in the full output.
To identify all possible function calls stored on the stack, we need to find all call instructions in the disassembly listing. The disassembly listing can be obtained from the ELF file using avr-objdump utility. The example output will have the format shown below.
Here, the first line shows the address and the name of the function. The following lines show detailed information about instructions comprising the function. The first column contains the address of the instruction (in bytes), the second column contains opcode of the instruction and the last column shows instruction mnemonic with optional comment. In this output we should be looking for all versions of the call instruction. For each call instruction, we need to make a note of the instruction address and the size of the opcode. Alternatively we can note the address of the next instruction.
A full set of call instructions for the AVR core is listed in the table below.
Each call instruction stores the program counter value that should be restored on return from the subroutine call. The standard AVR core has a 2-byte program counter and can address 64K words (or 128K bytes) of program space. With the appearance of bigger devices, such as 256K bytes of Flash and more, the program counter size was extended to 3 bytes. This also means that two cores store different numbers of bytes on the stack, which has to be taken into account when examining the contents of the stack.
In this example, the return address from the call instruction at the address 0x40a will be 0x40e. It is important to keep in mind that AVR stores the return address in words, since all instructions are naturally word-aligned. The least significant byte is pushed to the stack first and the stack grows towards lower addresses. So in this example the word address of the next instruction is 0x207 (0x40e / 2), and this call will appear on the stack as a sequence of bytes [0x02, 0x07] for ATmega128RFA1 or [0x00, 0x02, 0x07] for ATmega256RFR2. If this sequence is present in the stack dump, we can assume that function PHY_DataInd() was called from the function PHY_TaskHandler().
The process described above is laborious and not practical for everyday use. Fortunately, most of the steps can be automated and such automation tools are presented here.