CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Debugging: Making the move from parallel to high speed serial trace
Andre Yew describes the history trace debug and describes the evolution of High-Speed Serial Trace (HSST) and discusses how it replaces conventional parallel trace, especially as CPU speeds and System-on-Chip integration complexity increase.



Embedded.com
Have you ever had a bug that disappeared when you tried to debug it? Or how about an application that has to run at full-speed, and can't be stopped or slowed down to take a look at strange behavior?

Problems like this can only be debugged non-intrusively - debugging that has no side effects on the system. Trace was invented to solve these kinds of problems. Before we go on to talk about trace, let's look at these problems in more detail, including a real-world debugging situation that trace helped solve.

Bugs that disappear when you run them under a debugger or even with added printf() statements to the most innocuous places are usually caused by memory corruption or race conditions that depend on a very particular sequence and timing of events.

Adding a printf() statement alters the memory footprint of the program, and slows it down as well. Running a program under a debugger can slow a program down as well, depending on how the debugger interacts with the target being debugged.

Applications that can't be stopped or slowed down are usually at the heart of many embedded products. For example, a cellphone can't be halted in the middle of a call because it will hang up the call.

We were reminded once that we had left an inkjet printer halted in our lab by the smoke that started coming out from the printer as its print heads started burning the paper. Hard drive firmware code has large comments blocks that remind would-be human debuggers not to step through certain parts of the code or else risk the crashing the drive head into the platter.

Going beyond bug-finding, code optimization is often possible only when guided by non-intrusive measurement. The traditional way of profiling code is to co-opt a timer on the target and periodically poll the program counter to get a statistical view of slow spots in the code.

However, since this is statistical, it can only get an approximate view of performance: some events may not be sampled often enough or even not at all. Increasing the sampling rate will only slow the target down, thereby decreasing the accuracy of the measurement.

Statistical profiling also has to store its data somewhere and usually has to output its profiling data once its target buffers have filled up. This uses memory on the target, and intrudes on the target's run-time, which can have unexpected, serious effects. Clearly, traditional methods of collecting profiling are seriously limited.

During the development of the Green Hills Probe V2 (GHP2), an Ethernet-connected JTAG probe, we got a very real reminder of this kind of problem as we were chasing a performance problem that seemed to appear and disappear when code that had nothing to do with the problem area was changed. We were seeing variations in download speed ranging from 490 kilobytes per second to 850 kB/sec.

After some thought, we decided that it was probably a cache problem, but how do we prove that? Traditionally, this would have involved a bit of guesswork and experiments that can only indirectly hint at the problem.

Fortunately, the CPU used in GHP2 has trace, which can non-intrusively provide enough information for us to see what was happening to the cache. After collecting trace data, we quickly wrote a small Python script to simulate the CPU's cache system using the trace data collected to characterize cache usage with the fastest and slowest firmware.

Just as we had suspected, the slow firmware had far more cache misses than the fast firmware. The code in the critical loop was being bumped out of cache by code that had nothing to do with the loop, other than having the misfortune of being associated with the same cache lines.

Now, using this system, we could also optimize our system by configuring the linker directives file so that the critical loop is never evicted from its cache line. By doing this, we significantly exceeded the download speeds of even the fastest firmware: the download speeds now consistently hover around 1000 kB/sec, which is more than double the slowest speed.

Just as significantly, this was all accomplished in one afternoon's work. Without trace, we don't know how long it would have taken us to find the problem, much less the optimal layout for no cache misses. Trace had not only helped us identify the problem, but it also helped us find a solution that would not have been possible without trace.

Hopefully, we've shown you how trace is useful in typical embedded debugging situations. We'll quickly review trace as it exists today, and then look at high-speed serial trace, which is the next major evolution of this important debugging technology.

1 | 2 | 3 | 4

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :