Monitor-Based Debugging
A ROM monitor is an inexpensive, but powerful, debugging aid. Follow these steps to make a basic monitor even more powerful.
What do you think of when you think about debugging an embedded system? I suppose this depends on your background and the budget allocated for tools. You”re likely to consider a JTAG/BDM-based debug port, an emulator or logic analyzer, printf() , or one of the many sophisticated source-level debuggers available today. Each solution has its own set of pros and cons. Some are very powerful, but come at a hefty price. Some are tied to a particular compiler tool set, while others are only useful on certain CPU families. The JTAG/BDM-based debug port is probably the most common now because it strikes a good balance between cost and capability. These devices can still cost a few thousand bucks and are useful only if connected to the system, usually requiring some bulky pod to hang fairly close to the target.
The topic of this article is somewhat of a dying art: monitor-based debugging. What is monitor-based debugging? First, it assumes that your application is being run on a system that boots with a monitor of some kind. The boot monitor is the base platform on which the application resides, and if the application crashes the monitor takes over.
When in ROM
Monitor-based debugging relies on capabilities built into the processor, and in most cases (not always), this requires the ability to write into your instruction space. Typically, breakpoints are set in the monitor's command line interface (CLI), then control is turned over to the application. If the instruction at which the breakpoint is set executes, a branch is taken out of the application and control is returned to the monitor's CLI. Now the monitor has some ability to display memory, maybe single step, and maybe even return to the application at the location where the breakpoint was hit.
Sounds pretty good right? Well, it can be, but a lot of complications can come up, such as:
- How do you display memory? Typically, a monitor can display raw memory as a block of 1-, 2-, or 4-byte units, but you have to specify the address in hex. Since you are running at the CLI of the target, it's likely that all you can do is refer to the output map file generated by the linker to determine where the symbol is in data/bss address space. You have to correlate the symbol to some hex address. If the build changes, so does the memory map, so the next time you want to look at the same piece of data, you have to once again make the address-to-symbol correlation. In addition, the monitor does not know how to display in the format that you want, like shorts, longs, and char strings. And forget about structure display.
- How do you set the breakpoint? Similar to the previous problem, you first look up the address of the function, then you issue some command like “b 0x123456” where 0x123456 is the address of the function at which you want to set the breakpoint.
- How does the monitor talk to the serial port once the breakpoint occurs? The monitor takes over as a result of a breakpoint; but the application now owns the serial port. If the monitor reinitializes the serial port for its use, the interface between the application and this port is likely to be messed up. This means it will be very difficult to return control to the application.
- How does the monitor temporarily shut down the application? This gets tricky when the application is running on an RTOS, with interrupts enabled and a variety of different peripherals configured.
- Single-stepping is now at the assembler level, not the C-language level. This isn't very useful to the programmer.
Maybe this explains why monitor-based debugging isn't that popular anymore. Before we give up on it though, I”d like to re-investigate the topic and see if there might be some breath left in the old beast.
Debug philosophy
Let's start by setting some boundaries. We have to accept the fact that a monitor-based debugging environment has limitations. After we establish some guidelines and re-think the way some of this stuff is done, I think you”ll agree that there is quite a bit of capability still left. So let's establish a debugging model for the boot monitor. What do we get, and what do we sacrifice?
Gimmes
For memory display, we will have the ability to display “C-like” data: character strings, chars, shorts, ints, longs, and even data structures. The data structures can be displayed individually, in tables, or as a linked list. We will have the ability to reference data symbolically. This means that a global variable called “SysTick” can be referenced as “SysTick” without any need to know where it is in memory. Instead of thinking within the confines of a typical breakpoint, we will have run-time analysis that includes the breakpoint as one of its features. There are standard breakpoints that terminate execution of the application and “auto-return” breakpoints used for runtime analysis. After a breakpoint or exception is trapped by the monitor, a symbolic stack trace can be done.
Gotchas
We will not have any access to source code line number information; however, if the compiler supports the ability to dump mixed source and assembly code, this gets us around that problem. We will not consider single-stepping because it is assembly level; however, if we implement everything else, assembly-level single-stepping is a handy freebie. Since we only have access to global variables, the stack trace will not display parameters passed, nor will we be able to retrieve local data from a stack. The final, and most significant limitation put on this debugger is that control can't be returned to the running application after a hard breakpoint.
If these considerations are acceptable, the end result is a debugging environment that lives with the application. It can be shipped with the product and used in the field, and is somewhat independent of the compiler toolset, RTOS used, and to a degree, the CPU and hardware.
Monitor assumptions
For the remaining discussion, we will make some assumptions about the underlying monitor facility:
- The monitor includes a flash file system.
- The monitor's CLI can process symbols based on a file that is in the file system.
- The monitor can execute files as a list of commands (a script).
For example, if we have a file called symtbl in flash and it has the following lines in it:
main 0x123456func1 0x123600func2 0x123808varA 0x128000varB 0x128004varC 0x12800c
and we execute a script with the following two lines in it:
echo The address of main() is %main
echo “varA” is located at %varA
The output will be:
The address of main() is 0x123456
“varA” is located at 0x128000
This demonstrates the monitor's ability to interact with its own file system, process symbols based on a special file called symtbl , and execute a script that can be a series of monitor commands that use the symbol lookup capability of the monitor. The only thing needed is the ability to generate a symtbl file. The specifics on how to create this symtbl file will depend on the toolset (compiler/linker) used.
Breakpoints
A breakpoint forces the application (at a particular address or event)to turn control over to the monitor/debugger. When the application relinquishes control, all context is made available to the debugger so that it can display variables, dump the stack, and so on. For our discussion there are two distinct types of breakpoints: hard and soft.
A hard breakpoint allows the monitor's CLI to take control. There is no way to resume or continue the application code once this breakpoint occurs, unless the application is restarted.
Soft breakpoints (also called tracepoints ) are used for run-time analysis (an “auto-return” breakpoint). The breakpoint occurs and the monitor code (through the exception handler) is executed, but the monitor returns control to the application in real time. As will be seen in the text below, this is the harder of the two to implement.
There are several different types of soft breakpoints. Each type alters some state maintained by the monitor so that statistics can be gathered and used by the developer or by the monitor itself to possibly change the action taken by the soft breakpoint handler.
Both types of breakpoint are usually inserted into the application by replacing the instruction at the specified address with some other instruction that causes an exception to occur (from this point on we will refer to this instruction as a trap).1 The exception handler used must be configured to enter the monitor after storing the entire context (or register set) of the CPU. Some debugger-specific code is executed and, depending on the type of breakpoint, the monitor's CLI comes up (hard breakpoint) or control is returned to the application (soft breakpoint).
When control is automatically returned to the application, the monitor must restore all the registers that were active at the time of the breakpoint and return to the point where the breakpoint occurred. To do this, the instruction must be reinserted at its original location. That single instruction must be executed, then the trap must be re-inserted into the memory so that if that instruction is ever executed again, another breakpoint will occur. Also, we need to be aware that we are using data accesses to modify instruction space; hence, we may have to deal with a cache coherency issue. This is pretty complicated. Let's list the steps:
1. If the application controls all exception handlers, we must re-configure the exception handler corresponding to the trap so that it points to code owned by the monitor. To support soft breakpoints, we must also configure the processor's trace (or single step) exception handler to be owned by the monitor.
2. Insert trap(s) into the instruction address space (be aware of cache).
3. Turn control over to the application that is to be debugged.
4. At the time of the exception, copy all registers to a local area accessible by the monitor.
5. Determine the type of exception and take the appropriate action. If it's a hard breakpoint, branch to the monitor's CLI; otherwise, take the appropriate action based on the type of soft breakpoint and continue with Step 6.
6. Install the original instruction back into the address space (be aware of cache).
7. Restore the register context that was stored away earlier.
8. Put the processor into “trace” mode and return from the exception to the address that now contains the original instruction.
9. Immediately after this instruction is executed, the trace exception will occur and the monitor code must now reinstall the trap instruction and once again return control to the application. This re-installation is necessary so that the breakpoint will be active for the next time the CPU executes that instruction.
The code behind all this is certainly not trivial to implement, and because of the processor specifics, it is beyond the scope of this article. Varying degrees of complexity can be avoided, depending on what functionality is needed. The soft breakpoint could be limited so that it causes the breakpoint to occur only on the first pass through the code. After the first occurrence, the original instruction is put back into memory and full-speed execution is resumed. This eliminates the complexity of Steps 7 and 8, but also eliminates the ability to take that breakpoint again in real-time. To simplify things further, the whole soft breakpoint mechanism can be omitted and then the exception handler doesn't even have to worry about anything after storage of the register context in Step 4. In many systems, this is a reasonable limitation and eliminates a great deal of the complexity of the exception handler.
Code analysis
This section assumes that you are going to bite the bullet and implement the whole nine yards discussed in the previous section. The idea behind code analysis at this level is to provide the developer with some convenient mechanisms through which information can be gathered while the application is running at (almost) full speed. It's the use of the “soft” breakpoint mechanism described above.
Instead of establishing some set of soft breakpoints, let's configure the breakpoints so they can pass through some user-configurable logic (part of the monitor code) to determine what to do as a result of a breakpoint. Change the model just a bit: instead of just setting a breakpoint, we will set up some logical set of steps to occur as a result of some event. The event is the processor taking some kind of “breakpoint” exception and the logic is a piece of breakpoint-specific code that can be configured to perform an action or perform an action based on some condition. The monitor command syntax looks like this:
at {breakpoint tag} [if condition] {action}
There are three pieces to this command syntax after the command itself:
- The breakpoint tag correctly implies that the “at” command must be coordinated with some other processor-specific mechanism that sets breakpoints (using some of the methods just discussed). This tag is processor specific because the breakpoint mechanism is processor specific; however, this “at” mechanism is processor independent.
- The if condition is an optional test that can become part of the logic to decide whether or not to perform the action. The conditions can be the non-zero return of a specified function call, an “at” variable reaching some count, or the “at” flag containing some predetermined bit setting.
- The action is an optional operation controlled by the command that adjusts state or hands control to the monitor.
A few examples illustrate this idea:
Example 1 : The following sequence of “at” commands will establish a counting breakpoint based on the exception that occurs as a result of a DATA_1RD breakpoint. The breakpoint mechanism is CPU dependent, so for the sake of this discussion, it might be an exception that occurs using the first data breakpoint provided by the CPU. When the exception occurs, the “at handler logic” is part of the exception handler and for each “at” statement established by the user, there is one pass through the logic. The first pass increments the ATV1 variable (within the context of the “at” command) and the second pass checks to see if ATV1 is 5 and, if it is, halt the application and turn over full control to the monitor's CLI.
# Increment internal variable
at DATA_1RD ATV1++
# If ATV1 equals 5, then break
at DATA_1RD if ATV1==5 BREAK
Example 2 : This next set of “at” commands will break when breakpoint ADDR_1 is executed after breakpoint ADDR_2 has executed:
# Set bit 1 of the internal flags
at ADDR_1 FSET01
# If both bits 1 and 2 are set, break
at ADDR_1 FALL03 BREAK
# Clear bit 1 of the internal flag
at ADDR_2 FCLR01
# Set bit 2 of the internal flags
at ADDR_2 FSET02
Example 3 : This example demonstrates the idea of using functionality within the application to aid in the code analysis. The break will occur if the function at address 0x1234 returns 1.
at ADDR_1 0x1234()==1 BREAK
Example 4: As a final example, let's use the “at” command to help detect a memory leak. Assume ADDR_1 is malloc, ADDR_2 is free, and at the time ADDR_3 is hit, we expect no allocated memory to be available. We can verify the differential between malloc/free calls by observing the content of the ATV1 variable after the breakpoint:
# Increment ATV1 at ADDR_1 (malloc)
at ADDR_1 ATV1++
# Decrement ATV1 at ADDR_2 (free)
at ADDR_2 ATV1
# Break at ADDR_3
at ADDR_3 BREAK
Get the idea? The point is that there is a lot more capability behind a simple breakpoint that can be taken advantage of. The back-end, CPU-specific stuff is the same stuff that would be used to implement basic breakpoints, but adding the “[if-condition] {action}” extension puts a whole new spin on monitor-based breakpoints. Note that there is a bit of a real-time hit here because you are inserting code into the runtime stream of the application. This must be considered, but the minor hit is usually acceptable.
Debug hooks
One of the limiting factors of monitor-based debugging is that it usually requires that the instruction space be modifiable. For the vast majority of embedded designs, this is not the case because the instruction space is in EPROM or flash2 , and the CPU executes the code directly from that space.
Don't despair! Many of today's processors are equipped with special debug capabilities that overcome this limitation. Debug registers add to the versatility of the monitor-based debugger because the CPU can be configured to take a breakpoint based on one instruction address (or a range) without the requirement that the instruction space be written. Additionally, some support data breakpoints. This means that a breakpoint can be established based on a piece (or range) of data being accessed. The breakpoint can usually be established based on a read and/or write of the data space.
This added debug capability (different capabilities from different CPUs) means that the monitor must be able to deal with it. Instead of the generic mechanism of inserting some trap into the instruction space, you now have to be able to configure some set of registers to do something special. Data breakpoints add even more complexity to the monitor code (but they”re worth it). This is because the monitor cannot use the address at which the exception occurred to determine what breakpoint was hit. The breakpoint handler must first look at the CPU state to see if the exception occured as a result of a data-access breakpoint. If it was a data-access breakpoint, then the address at which the exception occured cannot be used to determine which breakpoint was hit. Other CPU states must be retrieved to determine the source.
Memory display
Almost any monitor will provide some type of memory display command. Similar to the above breakpoint facility, if it is implemented correctly, it can be a useful tool, even for the high-level software developer. If the display supports the hex and decimal display of address space along with support for 1-, 2-, and 4-byte data units, then that plus the CLI's fundamental ability to deal with symbols allows the monitor to display variables in their appropriate form. For example, let's assume we have a short variable called varB and we want to display it in decimal. This might be done with:
dm -2d %varB 1
where “dm” is the display memory command name, “-2d” is an option string indicating that the data is to be displayed in 2-byte decimal units, “%varB” is the name of the variable to be displayed, and “1” indicates that only one unit is to be displayed. The result is that you have the ability to display variables just as you would with a high-level debugger, and all you need to do is make sure that your on-board symtbl file is in sync with the application being debugged. You could go one step further and build a few simple scripts that save on the typing. For example, we can build the following scripts for displaying various different integer formats:
file int2: dm -2d $ARG1 1file uint4: dm -4 $ARG1 1
Now instead of typing “dm –2d %varB 1” the int2 script could be used:
int2 %varB
A simple “-s” option could also be incorporated so that memory could be displayed as character strings instead of raw hex. The code for the dm command is shown in Listing 1.
Listing 1 |
Data structures
Now let's take monitor-based memory display to a whole new level. Wouldn't it be nice to be able to display memory as structures and linked lists?
The problem with doing this at the monitor level is that the monitor doesn't usually have access to the information that the compiler/linker provides regarding the format of a structure. Since we are now assuming we have a file system, one might think that we could put the toolset-generated data in a file and allow the monitor to parse through it. We could, but parsing this data (the symbol table generated by the compiler) can be complicated, especially when you consider the fact that the format of this file could be very different from one compiler to the next. Even if we limit ourselves to a particular object file format (say ELF), the symbol table format may not be the same from one compiler to the next.
A simpler approach is to create a command in the monitor that can look to a structure-definition file in the file system to determine how to overlay a structure on top of a block of memory on the target. This eliminates all dependency on some external file format; hence, it works regardless of CPU type or toolset chosen.
The structure definition file is an ASCII file that contains some structure definitions almost as they would be seen within a C header file. The command in the monitor can then use this file as a reference when asked to display a particular block of memory. This, combined with the symbol-intelligent CLI, allows us to do something like this:
cast abc %InBuf
This command would look for the file structfile in the flash file system and, if found, it would overlay the structure defined as abc on top of the address that would be extracted from the symbol table for the symbol %InBuf. Add a few options to this and you can specify to the cast command what member of the structure is the “next” pointer. This turns it into a linked list display tool with almost no additional coding effort.
The structure definition used by this command is assumed to be in the structfile . In general, the format of this file is similar to that of standard C structure definitions, but with some limitations. The types “char,” “short,” and “long” are supported and will be displayed as a 1-, 2-, or 4-byte decimal integer respectively. To support the ability to display in hex, the types “char.x,” “short.x,” and “long.x” are supported, and if a character is to be displayed as a character (hex 0x31 printed as “1”), the “char.c” type can be applied. For example, in the following structure definition:
struct abc { long i; long.x j; char.c c; char.x d; char e;}
The member “i” would be displayed in decimal format; “j” would be displayed in hex. The member “c” would be displayed as a character, “d” would be displayed in hex, and “e” would be displayed as a 1-byte decimal integer. If a structure has an array in it, then the user must define that as an array of one of the fundamental types I described with the appropriate size. This “cast” command does not display arrays within structures simply because of the complexity of the output generated. So it is treated like padding with only the name and array size displayed.
Here is an example of a structure definition file that demonstrates all of the functionality of the “cast” command. Note that the “#” sign signifies a comment.
struct abc { long i; char.x c1; pad[3]; # Not displayed struct def d;}struct def { short s1; long ltbl[32]; # Not displayed short s2;}
Notice the embedded structures, use of the “.x” suffix, and the pad[] format. This implementation is totally unaware of compiler-specific padding and CPU-specific alignment requirements. If the structure definition puts a long on an odd boundary and the CPU does not support that, then cast is unaware of this limitation and is likely to cause an exception itself. The user must add the appropriate padding to deal with this. As a result, the “pad[]” descriptor is used for CPU/compiler-specific padding. If the member is of type “char.c *” or “char.c [],” cast will display the ASCII string (if you don't want it to be dereferenced, use “char.x *”).
The code for the cast command is shown in Listing 2.
Listing 2 |
Stack trace
Aside from variable display, a stack trace is probably the most useful in the firmware developer's bag of tricks. It turns an exception that looks something like this:
EXCEPTION:
addr align err at 0xf001400a
into something that looks like this:EXCEPTION:
addr align err at 0xf001400a
0xf001400a: error_func()
0xf0008040: funcXX()
0xf0011288: funcYY()
0xf0011148: task_ABCD()
A stack trace allows the developer to determine what function nesting got the code to that point. This can save lots of debug/analysis time.
Stack trace capability is usually considered something that is only offered by a full-blown debug environment. This doesn't have to be the case. Having implemented this function for a few different CPUs, I can tell you that this is a pain in the butt to get working. But once it works, you just can't live without it.
In our monitor-based stack trace we will use the contents of the symbol table file we”ve been using all along, but now we are going to make the assumption that all the symbols are listed in the file in ascending order. We will also limit our monitor-based stack trace to provide function nesting only.
The ability to look at variables on the stack is a bit more complicated. Think about it for a minute. At any point in execution time, the CPU must be able to retrace its steps because whenever any function returns, the function that called it continues. All of this return information is in the stack frame somewhere; we just need to find it. On the other hand, the ability to display variables within a particular function's stack scope is not a natural thing for the CPU. Hence, this does require more information from the compiler regarding how and where the variables are stored in the frame.
The majority of the code for a stack trace implementation is compiler and CPU specific. Some of the parsing of the symbol file is generic and can be reused on multiple implementations. As I mentioned, I found this to be a challenge to get working, but it has been well worth it. Listing 3 contains the code for a PowerPC stack trace and the address-to-symbol translator.
Listing 3 |
As one final note here, consider the case where your product is in the field and is occasionally “just resetting” (as described by the customer). This is usually some bad code causing an exception that simply resets the target. These kinds of problems can be very hard to reproduce because they only seem to happen on the customer site every third blue moon.
So how do you “catch” the bug in the act? You probably can't leave an emulator at a customer site, and it is very unlikely that the customer is going to allow you to debug on their site. With this stack trace capability and some of the other capabilities in the monitor, an environment can be set up so that any exception that occurs will automatically cause the monitor to dump the output of a stack trace to a file in the file system, then restart the application. Then you can occasionally query the system to see if this file is present, and if it is, transfer it to your system for analysis.
Conclusion
Monitor-based debugging doesn't provide all of the capability that comes with some of the debugging tools available on the market today. It is, however, a tool that can reside in ROM alongside the application. This means that it is still in the system when deployed in the field. There is no need for a special connection or even an additional serial port. This convenience can prove itself invaluable under a variety of circumstances.
Most of what I've described in this article is available in a boot monitor package called “MicroMonitor.” The entire monitor package can be downloaded from Lucent's Research Software Distribution web site at www.bell-labs.com/topic/swdist/. A lot more information on this topic as well as the entire topic of booting an embedded system is covered in my book Embedded Systems Firmware Demystified (CMP Books).
Ed Sutter graduated from the engineering program at DeVry Technical Institute, and received a bachelor's degree from New Jersey Institute of Technology. He is currently a distinguished member of the technical staff at Lucent. He has been writing code for embedded systems at Lucent/AT&T Bell Labs for about 15 years, for architectures ranging from the 8051 to MIPS R4000. His area of expertise includes embedded system bootup, device drivers, and RTOS BSP development, as well as Unix/Win32-based embedded system development/support tools. He can be reached at .
Endnotes
1. On the 68000 this would be a trap instruction. For the x86, use INT3. Each CPU has some instruction to implement the same functionality.
2. Yes, flash can be written-but not in a way that would allow our implementation of a breakpoint to be practical.
3. The leading percent sign tells the CLI that the string is a symbol and should be replaced with the hex address that is in the symtbl file. If no such file is present, then no replacement is made.