Many of the software bugs in deeply embedded systems can be debugged using a hardware debugger, but there are instances when a problem maybe so elusive that it appears only following prolonged code execution and/or complex interactions with other nodes in the system. Depending on the system design, it may not always be practical to attach a hardware debugger, and as a result alternative debugging methods must be employed.
This article will discuss such problems and provide a software technique that captures call stack in real time and uses the stack dump from the embedded system at the point of failure. This article will also discuss a Python-based tool to match content of the stack to the disassembler output to recover full-function call stack and eventually find a point of failure.
For this article we will focus on debugging wireless sensor networks, but the same techniques can be applied to any system with a large number of distributed devices, or even a single device, that cannot be run under the debugger.
One of the major problems with debugging networks of devices is that behavior of the individual devices depends on the behavior of the surrounding nodes and the amount of traffic being exchanged. This makes it impossible to debug such systems on a low scale.
The method presented here is based on the analysis of the stack contents of the failed node and matching the contents to the disassembly of the application binary in order to recover the function call stack. A watchdog timer is used to detect an infinite loop condition and to trigger saving of the stack contents.
While the idea of this method is generic, different MCUs and compilers use slightly different ways of handling stacks, so tools shown here will have to be modified for a specific target system. For this article we will focus on Atmel’s AVR MCUs, in particular the ATmega128RFA1 and ATmega256RFR2, and a GCC compiler.
To effectively use the method proposed here, we need to understand how compilers use stacks to store return address from the called functions. Fundamentally there are two types of data that have to be temporarily stored in the stack – local variables of the called function and return address to the calling function. Some compilers have two separate stacks, which makes it easier to recover the call stack since it is stored in one sequential memory location.
GCC compiler uses the same stack to store both return addresses and local variables. This makes it harder to parse the stack without deep analysis of the function operation. Fortunately it is possible to recover most of the useful information using a brute-force method. Instead of trying to recover return addresses from known locations, we will go over every possible combination of bytes of the appropriate length on the stack and check if it can be a return address. This method will occasionally give false positives by interpreting data on the stack as a valid return address, but usually they are easy to spot in the full output.
To identify all possible function calls stored on the stack, we need to find all call instructions in the disassembly listing. The disassembly listing can be obtained from the ELF file using avr-objdump utility. The example output will have the format shown below.
Here, the first line shows the address and the name of the function. The following lines show detailed information about instructions comprising the function. The first column contains the address of the instruction (in bytes), the second column contains opcode of the instruction and the last column shows instruction mnemonic with optional comment. In this output we should be looking for all versions of the call instruction. For each call instruction, we need to make a note of the instruction address and the size of the opcode. Alternatively we can note the address of the next instruction.
A full set of call instructions for the AVR core is listed in the table below.
Each call instruction stores the program counter value that should be restored on return from the subroutine call. The standard AVR core has a 2-byte program counter and can address 64K words (or 128K bytes) of program space. With the appearance of bigger devices, such as 256K bytes of Flash and more, the program counter size was extended to 3 bytes. This also means that two cores store different numbers of bytes on the stack, which has to be taken into account when examining the contents of the stack.
In this example, the return address from the call instruction at the address 0x40a will be 0x40e. It is important to keep in mind that AVR stores the return address in words, since all instructions are naturally word-aligned. The least significant byte is pushed to the stack first and the stack grows towards lower addresses. So in this example the word address of the next instruction is 0x207 (0x40e / 2), and this call will appear on the stack as a sequence of bytes [0x02, 0x07] for ATmega128RFA1 or [0x00, 0x02, 0x07] for ATmega256RFR2. If this sequence is present in the stack dump, we can assume that function PHY_DataInd() was called from the function PHY_TaskHandler() .
The process described above is laborious and not practical for everyday use. Fortunately, most of the steps can be automated and such automation tools are presented here.
For the purpose of this demonstration we will use a network of wirelessdevices running Atmel’s Lightweight Mesh stack. The code of the stackwas intentionally modified to randomly fail on a memory allocation when anew frame is received. The failure is representative of an issue thatcan be found in a real application. For example, memory allocation mayfail under heavy load when no memory could be allocated. Conditions likethis can only be reproduced on a large network, so it would beimpossible to debug this issue using conventional methods.
Beforedebugging can begin we need to instrument the application code. Thereare several ways to obtain a stack dump from the MCU. In this case wewill try two of them – saving stack in the EEPROM and sending it overthe UART. The instrumentation code for those two methods is located inthe dump_eeprom.h and dump_uart.h files respectively.
Savingto the EEPROM has the advantage of having stack traces storedpersistently for later examination, but it has limited capacity and mayinterfere with normal EEPROM use by the application. The UART stack dumphas to be collected while the device is running, but uses only oneserial interface, which is less intrusive for the application.
Both header files define an initialization function wdt_init() , which must be called from the application initialization code. The application must also call wdt_reset() periodically to ensure that the watchdog timer is not triggeredaccidentally. The header file should be included once in the applicationand only one header file should be included at a time. Application useof the watchdog timer must be ceased for the duration of the debuggingprocedure.
After the watchdog event is triggered, normalapplication execution will be stopped. This serves as an indication thatthe device is ready for collection of the stack dump. The watchdogevent handler can also be extended to provide a more visible indicationin the hardware-dependent way.
Stack trace can be read from theEEPROM using Device Programming dialog in the Atmel Studio. The stackreversal script presented here can read the Intel HEX file directly, sono additional manipulations are required.
In case dump_uart.h file is used, the stack dump is sent over UART in a loop. Each new dumpis started with a predefined sequence of four bytes [0x4e, 0xda, 0xf5,0x25]. In this case the stack trace can be read manually using anyterminal program capable of receiving binary data, or using providedPython script extract.py . This scriptwill automatically look for the synchronization sequence and dumpproperly formatted stack contents into the standard output, which can befurther redirected to a file. For example, invocation
python extract.py -p COM12 > WSNDemo.dump
willread the stack dump from the COM port COM12 and save it into a fileWSNDemo.dump, which can be read directly by the stack reversal script.
Anapplication disassembly listing is also required to recover thesymbolic names and corresponding addresses of the functions. Adisassembly listing can be obtained using the avr-objdump utility:
avr-objdump -d WSNDemo.elf > WSNDemo.txt
Now run the stack reversal script, providing disassembly listing and stack dump in either raw or Intel HEX format.
python avrstackrev.py -b -l WSNDemo.txt -s WSNDemo.dump
The output of this command should look like this:
0x0000cf: call to main() from .do_clear_bss_start()
0x000f8b: call to SYS_TaskHandler() from main()
0x0009c7: call to PHY_TaskHandler() from SYS_TaskHandler()
0x000207: call to PHY_DataInd() from PHY_TaskHandler()
0x0005b2: call to nwkFrameAlloc() from PHY_DataInd()
Eachline of the output contains return address and information aboutdecoded function calls. The first line corresponds to the first calledfunction. Normally it will always be a call to main() from the low level initialization code, and output that shows otherwisemight be an indication of a corrupt or incomplete stack dump.
From this output we can see that execution has stopped in the function nwkFrameAlloc() , which gives enough information for close examination of this function by hand.
Problemswhose nature is different from the implicit infinite loop can also beeffectively debugged using a technique presented here. For example,let's consider an event driven system, where execution of the next stepdepends on the confirmation returned from the previous step. Ifconfirmation is lost for some reason, the application itself will appearto be running as normal and will reset the watchdog timer periodically,which means that the problem will not be indicated by the presentedmethod.
An application timer may be used to diagnose problemslike this. The timer must be started at the time of requests and stoppedin the confirmation handler. If the timer is not stopped in time, thenthe timer event handler should trigger the watchdog intentionally. Theplace where the watchdog timer was triggered will indicate the problem.
All tools used in this article along with complete source code are available from the Atmel Spaces portal.
Alex Taradov is an applications engineer at Atmel Corporation (www.atmel.com)working on low-power and low-data rate wireless networks software.Thanks to his father, he started learning about electronics at the ageof six, eventually transforming his passion from electronics intoprogramming. Alex is proficient in a variety of programming languagesincluding Basic, Pascal, C, and Python. He graduated with an MSc fromBauman Moscow State Technical University in 2007, and since has written avariety of languages for a variety of systems, ranging from the tiniestmicrocontroller to desktop systems.