Back to the basics: Using new trace techniques to debug advanced 32 bit MCUs -

Back to the basics: Using new trace techniques to debug advanced 32 bit MCUs


Imagine a scenario in which you have recently taken over an existing project, and when you run the program, you find that it is missing critical deadlines. You realize that real-time behavior is lost, and you try to identify the cause of the problem by using a debugger to step through the source code while the program is running on the target. Unfortunately, the program starts behaving differently. Not only are you not able to locate the delay, but other functionality has changed because the system is now running at a slower speed.

What’s worse is that these deadlines are missed intermittently and the program’s behavior is difficult to reproduce. The system is no longer deterministic. To complicate the situation, the engineer who originally developed the code is on vacation. Luckily, modern trace techniques in the form of on-chip logic on advanced CPUs, hardware trace probe assembly (TPA) buffering, trace windowing, and software trace analysis tools allows you to reproduce the system functionality, reconstruct timing constraints, and more easily pinpoint the causes of these too familiar problems.

Trace allows developers to collect information about the state of the CPU over a period of time. The series of states are combined to provide a history of the CPU’s execution. This data is then used to analyze the behavior of the system to identify bugs, inefficiencies, and unused code.

In the past, hardware logic analyzers collected CPU state information for analysis. Hardware logic analyzers became limited because they relied on the ability to monitor external address and data busses for information. As processors became more complicated, more functionality was performed internally, and logic analyzers were unable to adapt since signals and state information previously available to logic analyzers became hidden.

Fortunately, modern trace hardware implemented in many of today’s advanced microprocessors has made it possible to gain access to these internal signals without disturbing the system or surrendering correctness. New trace collection and analysis techniques use this trace data to reconstruct the state of the processor and provide in-depth visualization of system behavior.

First, do no harm?
One of the keys to reliably preserving system state is collecting this information with minimal intrusion. Program flow and system events, such as interrupts and context switches, often rely on precise timing. Hardware trace is completely non-intrusive in that it does not require any code modification and it does not affect execution timing.

For instance, with ARM’s Embedded Trace Macrocell (ETM) technology, a hardware trace module runs alongside the CPU, collecting instructions and data. This allows you to run your program at full speed and only once to capture system execution data, which can be used to reproduce and analyze the behavior as it ran in real time. In contrast, tools that require halting or slowing the system to gather trace data alter system behavior, compromising timing and data accuracy. Hardware trace allows you to view an accurate depiction of the entire system, to confirm functionality and create a more thoroughly tested and reliable end product.

The logistics of trace and debug
The trace process can be broken down into three major steps, which are shown in Figure 1, below . First, the raw information is collected from a trace-enabled microprocessor or alternative method to output trace data. Some examples of trace-enabled microprocessors include ARM with ETM, PowerPC variants, and selected MIPS-based processors.

Figure 1

Other methods exist for collecting program execution information from microprocessors without hardware trace using instrumentation. In addition, instruction set simulators provide a way to collect this data when hardware is unavailable. An external device commonly known as a trace-port analyzer, or TPA, connects to the microprocessor and collects the trace data in a circular buffer. This buffer is continuously updated, and new data replaces the old when the buffer is full. A host computer with analysis software then connects to the TPA to upload and analyze the trace data.

Collection and TPAs
In recent years, the trace probe array (TPA) buffers, or trace windows, have become larger to accommodate increasing amounts of trace data for analysis. In some cases, these buffers now store up to 1 gigabyte of trace data, significantly more than previous generation devices that stored only a few megabytes. Extended trace windows enable a more complete view of system execution over a longer period of time, enabling users to capture more bugs and more easily resolve them.

To further maximize the amount of data collected in the fixed size buffer, some TPAs employ compression schemes or support optional configurable hardware triggers that can start and stop trace collection in the TPA to focus on certain periods of time, memory locations, or source code execution. TPAs must be able to keep up with increasing trace interface speeds, attributed to increasing processor speeds, to consistently capture trace data.

Software Trace Analysis Tools
Traditional software analysis tools translated trace data from the TPA into assembly instructions viewable on the host. Many users understandably balked at the task of sifting through millions of assembly instructions to identify system behavior. Some less fortunate users became lone “trace experts” who were called upon to use their unique knowledge of these clumsy trace tools to uncover elusive bugs when all other debugging techniques had failed.

Modern trace analysis tools are much easier to use to apply trace information to your application. They advance the translation of trace data from the target to the host several steps further by correlating the trace data with source code and providing visual representations of system behavior. The inexperienced user can now integrate trace into everyday development to find bugs faster, optimize application performance, and verify testing coverage.

Software trace tools provide the ability to step through the source code that actually executed on the target using a reconstructed system, revealing which branches were taken and in what sequence.

For higher-level visibility, these tools create faithful reconstructions of functions, call stacks, and blocks. For lower-level visibility, they offer the ability to monitor registers, memory, and variable values over time. This additional insight helps to identify common sources of program errors, such as incorrect variable values or incorrect paths that lead to problems observable later in the program. These ways of identifying the occurrence and sources of bugs quickly result in the ability to create a more reliable system in less time.

Tight integration with advanced software analysis and profiling tools provides additional system visibility and performance measurements. From the trace data, these tools introduce program behavior statistics such as cycle counts, memory loads and stores, function calls, and branches taken.

These tools are particularly useful when developing ARM and Thumb code by specifying 32-bit and 16-bit instruction counts and displaying interlaced instructions. Information about where time is spent helps to improve system performance by isolating areas of delay.

With this detailed information, you can find program “hot spots” in non-intrusive way and selectively optimize those areas. These tools use advanced automation capabilities to create reports on system metrics for further analysis and performance optimization. Optimized system performance achieved through trace capabilities can help to lower the cost of the processor required for the same functionality, minimize power consumption, or allow for additional functionality without adding to the hardware configuration or changing the system design.

Using Advanced Trace Tools
Advanced trace tools offer complex execution and data breakpoints like a regular source-level debugger while maintaining accurate timing through the reconstructed system. With this new technology, you can even run backwards and forwards, step through source code, hit breakpoints, and view variable values and memory to quickly find the source of unexpected or incorrect behavior.

Inspection is extended to real-time operating systems to help users to understand complex interactions of events and resources. For example, software trace tools provide precise timing measurements such as context switch time and interrupt latency. They further enable users to examine system-wide events and task interactions by offering simultaneous debugging of multiple tasks running on the reconstructed system. Some software trace tools are even virtual memory-aware for use with operating systems that provide virtual address space protection.

Figure 2

Complete knowledge of system behavior is more attainable through the use of graphical displays. Some of these include visual analysis of functions running over time and tree views and graph views of function call trees. An exciting advantage of these capabilities is that you can more easily identify cases in which a function executes for a longer period of time in a single instance, as shown in Figure 2 above.

At the operating system level, events such as task context switches, exceptions, interrupts, kernel calls, user events, and task status changes can be monitored graphically over time, as shown in Figure 3 , below. Combining trace data with source information, these tools can highlight which lines of source code were executed to pinpoint areas for performance improvement and examine test coverage.

Figure 3

To help illustrate these points, let us revisit the scenario of the project with the problematic timing behavior. It is almost impossible to identify the source of the problem by using traditional source-level debugging techniques. Instead, you need to accurately portray the interactions within the system at very small intervals of time, which can be done using trace. You find that someone has written an interrupt handler that takes longer than it should. With trace, the exact problems and the condition causing the problem, in this case the interrupt handling, are well known.

Putting advanced trace to work
To further illustrate the useful ways these hardware and software trace tools can be put to work let’s look at the example of an interrupt handler that reads in a stream of integers and stores them in a circular buffer that contains 1024 entries. While admittedly a simple example, it does show clearly some of the ways advanced trace methodologies can simplify the system debug process.

In this example, multiple integers can be read during a given interrupt, and the handler records the grouping of integers by storing the number of integers in a sequence in the unused top bits of the first entry of the sequence (we will refer to this as the “header” for the purpose of this example) in the circular buffer.

An oversight in a conditional statement eventually will lead to a write beyond the circular buffer, corrupting the memory used to store the pointer to the header entry. When the header entry is updated after all the integers of the sequence are read, the corrupt pointer is dereferenced and written to, causing further memory corruption, as shown in the code snippet below:

int count = 0;
int pkt_buffer[1024];
int *header;
void store_sample (int sample) {
pkt_buffer[count] = sample;
if (count < 1024)="">
} else {
count = 0;
void interrupt_handler() {
int sample;
int total_pkts = 0;
header = &pkt_buffer[count];
sample = get_sample();
while (sample != 0) {
sample = get_sample();
*header |= total_pkts < 20;="">

Iteratively stepping through debug with trace
Without trace, the methods to debug this problem would likely include several iterative steps, even in the best case, but only if the invalid pointer happens to contain an inaccessible address and accessing it causes a memory access violation.

Such an invalid pointer could point to an accessible address, continuing the memory corruption by silently overwriting other variables. The latter is even more difficult to diagnose without trace, but for this example, we’ll assume the simplest case.

In this case, the steps to uncover the problem would generally include: connect to the target, load the program on the target, run the program, receive a memory access violation error message where header is dereferenced and that value is set.

You determine there is a problem with header, and proceed to reload the program, set a breakpoint at the point where the header is modified in the source code (header = &pkt_buffer[count];) and check that the value is set correctly.

You find that it is set correctly and continue from that point to record the value of header to find out when it takes on an illegal value. This would require stepping through at least 1025 iterations of the inner loop.

You could instead set a hardware data access breakpoint on “header,” but even this more complex debugging strategy would be slow, as the breakpoint would be triggered twice per interrupt, and hundreds of interrupts could come in before the 1025th integer is queued. Not exactly the best use of your time or a fast way to determine the cause of the illegal access.

Instead, using trace with software analysis, you can turn on trace recording and run the program once up to the point where the violation occurred. If you use a debugger with the ability to move, time-machine-like, through the code backward and forward, you would notice that the starting point is the last instruction where the access violation occurred.

Opening a watch window in the debug software to view changes in header, you would run the code backwards until you stop at the last access to header before the violation.

When the debugger stops at that line in the source code, you find that the bad value was written to the header variable not during a regular assignment, but in one of the sample stores.

Even easier, you can set a watch point on the header variable and run backwards. The debugger will stop once the value of header changes. This is essentially like running backwards in time with a hardware data breakpoint. You click on the iteration variable to discover that header was set specifically at the 1024th iteration when a store to the 1024th index of the allocated buffer array is written. It is now a simple to determine that this is one index past the end of the array, and you change your code to write only to the 1023rd index of the array.

Michele Mixter is product manager at Green Hills Software, Inc. .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.