eXtreme instrumenting - Embedded.com

eXtreme instrumenting

The county has been repaving my street for the last few weeks. Workers brought in a big, noisy machine that ate away two inches of the old surface in a single pass, feeding a few cubic meters of ground-up rubble to a succession of dump trucks each second. Then an even larger contraption pulled hot and oily asphalt from other trucks and put a pool-table-flat road down in a single pass. A small army of acolytes milled around the machine as it crept slowly along. A couple fed raw material into it, another tickled mysterious levers to control the beast, and some workers directed the procession of traffic. I walked along with them for quite a while, as it's always interesting to see how people do their job.

I watch embedded systems developers do their thing, too. Joe Coder is hunched in front of his monitor, furiously single-stepping, examining watchpoints, and using a debugger to ensure his code executes correctly. When the thing finally works he breathes a huge sigh of relief and moves on to the next project.That approach makes me shudder.

Joe builds real-time code. He's doing the code part of that just fine since traditional debugging gives us great insight into the program's procedural domain. That's the if-then, do-while bit that represents the visible components of all software.

But he's totally ignoring the real-time part of his job. How long does an interrupt handler take? A microsecond . . . or a week? Is the unit idle 90% of the time . . . or 1%?

A management maxim states, “if you can't measure it, you can't manage it,” which certainly holds true for developing embedded systems. We can and do measure everything about the procedural nature of the code; similarly we can and must measure and manage the nature of our systems in the time domain.

That was easy in the olden days. A dozen or more in-circuit emulator vendors offered tools that handled both code's time and procedural nature. Sophisticated trace caught every aspect of system operation at full execution speed. Time stamps logged what happened when. Performance analyzers isolated bottlenecks.

Then processors got deep pipelines, making it difficult to know what the core chip was doing. Caches followed, changing difficult to impossible. Now the processor is buried inside an FPGA or ASIC. The Age of Emulators faded as surely as the Pleistocene, replaced now by the JTAG Epoch. Ironically, as the size and scope of embedded applications exploded, our tools' capabilities imploded. Most debuggers today offer little help with managing the time domain. But we must measure time to understand what the systems are doing, to find those complex real-time bugs, and to understand where the microseconds go.

Output bits
Since the tools aren't up to snuff, stride into the EEs' offices and demand, at least on the prototypes, one or more parallel outputs dedicated to debugging. It's astonishing how much insight one can glean from a system by simply instrumenting the code to drive these bits.

Want to know the execution time of any routine? Drive one of the bits high when the function starts and low as it exits. Monitor the bit with an oscilloscope–which every lab has–and you can measure time to any precision you'd like. The cost: pretty much zero. Insight: a lot.

The top trace in Figure 1 monitors an interrupt request line. On the bottom we see the output bit, driven high by the code when the corresponding interrupt service routine starts, and low immediately before exit. The time from the leading edge of the request to the assertion of the bit is the interrupt latency, a critical parameter that lets us know that the system will respond to interrupts in a timely manner. Then, duration of that bit's assertion tells us the ISR's execution time.

Note that the same ISR has been invoked twice, with different responses each time. No doubt there's a decision in the code that changes the execution path. Or a pipeline might be filling, or any of a dozen other factors are influencing the routine's performance. Trigger a digital scope on the bottom trace's rising edge, as shown in Figure 2. The time from the bit being asserted to the beginning of the hash is the fastest the thing ever runs; to the end of the hash is the slowest.

Wow–two lines of code and one output bit gives a ton of quantitative information!

Wise managers demand parametric data about firmware performance at the end of a project. How much free flash space do we have? How about RAM? Without that data it's impossible to know if it's possible to enhance features in the future. The same goes for performance numbers. If the system is 99.9% loaded, adding even the simplest new function will have you emulating Sisyphus for a very long time.

Instrument the idle loop or create a low-priority task that just toggles the output bit, as shown in Figure 3. Where there's hash, it's idle. Where there's not, the system is busy.

This is easy to do since many operating systems have a built-in hook that's called whenever the system has nothing to do. Micrium's C/OS-II, for instance, invokes the routine shown in Listing 1, to which I added instructions to toggle the output bit. The cost: just 480ns on a 33MHz 186.

Listing 1: Task that computes idle time

/**************************************************                                  IDLE TASK HOOK**************************************************/void  OSTaskIdleHook (void){   outportb(test_port, 1);  // Assert instrumentation pin   outportb(test_port, 0);  //  but just for a moment} 

eXtreme instrumentation
What we really want is a performance analyzer, an instrument that's always connected to the system to constantly monitor idle time. The tool immediately alerts you if new or changed code suddenly sucks cycles like a Hummer goes through gas. But plan on spending a few tens of thousands for the tool.Or not. Remove all load from your system so the idle hook runs nearly all of the time. Use the scope to figure the duty cycle of our trusty output bit. On my 33MHz 186 system running μC/OS-II the duty cycle is 8.6%.Now get Radio Shack's 22-218A voltmeter (about $15) and remove the back cover. Find the 29.2K resistor and change it to one whose value is:

Where DutyCycle is in percent and MaxVolts is the system's power supply voltage. Monitor the output bit with the meter as shown in Figure 4. Congratulations! Your cheap VOM is now a $10k performance analyzer. It'll show the percentage of time the system is idle. Leave it hooked up all the time to see the effect of new code and different stimuli to the system. The Radio Shack salesman tried to sell me an extended service plan, but I'm pretty sure this mod will void the warranty.
Figure 4: A poor person's performance analyzerIt's fun to watch a colleague's jaw drop when you explain what you're doing.The needle's response won't keep up with millisecond-level changes in system loading. Not a problem; modify the idle hook to increment a variable called Idle_Counts and invoke the task shown in Listing 2 every second.

Listing 2: Task used for modifying analyzer

static void Compute_Percent_Idle_Time (void *p_arg){float Num_Idles_Sec=178571.0;   // In fully idle system we                                //    get this many                                 //    counts/secfloat Idle;                     // Percent idle timewhile(1)   {      Idle= 100.0 * (((float) Idle_Counts)/Num_Idles_Sec);      printf("nIdle time in percent= %f", Idle);      Idle_Counts=0;      OSTimeDly(OS_TICKS_PER_SEC); // Go to sleep   }} 

Obviously the Num_Idles_Sec parameter is system-dependent. Run the code in a totally idle system and look at Idle_Counts to see how many counts you get in your configuration.Modify the routine to suit your requirements. Add a line or two to compute min and max limits. I naughtily used printf to display the results, which on my 186 test system burns 40ms. This is a Heisenberg effect: making the measurement changes the system's behavior. Better, log the results to a variable or send them to whatever output device you have. Consider using longs instead of floats. You get the idea.

R-2R
Some lucky readers work with a hardware designer who has a fond spot for the firmware crowd. Buy him a beer. Wash her car. Then ask for more than a single output bit–maybe even three.Construct the circuit shown in Figure 5 and connect the three points with PIO designations to the output bits. Often called an R-2R ladder network, this is an inexpensive and not terribly accurate digital-to-analog converter.

Now we can look at the real-time behavior of tasks in real time. Want to know which task executes when? Again using μC/OS-II, change the Micrium-supplied hook that executes whenever a task is invoked as follows:

/****************************          TASK SWITCH HOOK****************************/void  OSTaskSwHook (void){   outportb(test_port,          OSTCBCur->OSTCBId);} 

This sends the next task's ID (a number from 0 to whatever) to those output bits. Probe the R-2R ladder with the scope and you'll see something that looks like Figure 6.

Task 0 is running when the scope reads zero volts. Task 1 runs when the voltage is 1/8th of the power supply, and so forth. By adding less than a microsecond of overhead we can watch how our system runs dynamically. Sometimes it's useful to trigger the scope on some event–say, a button press or the start of incoming data. The scope will show how the system schedules tasks to deal with that event.

Three bits lets us monitor eight tasks. More bits, more tasks. But the resistor network isn't very accurate, so after about four bits use a real D/A converter instead.

The same technique can monitor the system's mode, if you maintain mode or status info in a few bits. Or the size of a stack or the depth of a queue. Compute the data structure's size, correct for wrap on circular queues, and output the three or so MSBs. In real time, with little Heisenberging, you'll see system behavior–dramatically. It's simple, quick, inexpensive, and way cool.

Last month I neglected to mention another very valuable resource for anyone building any sort of approximation. Jack Crenshaw's Math Toolkit for Real Time Systems covers many other sorts of approximations, and addresses other sorts of issues–like working with integers. Highly recommended.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at .

Reader Response


Excellent article, Jack. These are the kinds of tips that show why we need more mentoring in the FW/SW engineering fields.

– Andy Kunz
Sr. Firmware Engineer
Transistor Devices
Hackettstown, NJ


About the R/2R ladder: precision is much improved when using two 10K resistors instead on one 20K. So you only need one type of resistor and there is generally very little dispersion between components coming from one fabrication batch.

– Jean-Christophe MATHAE
CIRTEM
LABEGE, France


Great article. I especially like the performance analyser! I've been using the bit toggle trick for years now, even on the smaller micro's and it works a treat. It really puts into perspective the dynamic response of the application. Many designs have a LED or two, and they work great as debug ports during development, not just for optical output (quite useful) but also for the more transient stuff (very useful). No extra pins required.

– Jonathan Sables
Design Eng
SKEG Product Development
Cape Town, South Africa


Fantastic article! Makes “low-level-debugging” accessible to a lot of us. Thanks for the tips.

– Mahe RSHI
Santa Clara, CA


Great article… thanks! It'd be neat to expand on your idea to use your $20 performance analyzer to monitor stack usage, com port usage, etc. Flip a couple of bits to monitor min, max or average use over some time period or when some feature or set of features is active, etc. Neat idea!

– Dave Sudolcan
Software Architect
Lancer
San Antonio, TX


Jack,

As usual an excellent article. A couple of other thoughts:

In one series of projects I worked on, the same processor was used on a lot of radically different projects. But the same programming pins were required to program the processor (from the HC08 family). I insisted the hardware designer maintain the same pinouts and pin spacing for the programming connectors on all the boards. I also refused to allow any other use of these pins on the board. Then I made up a little board I could plug in to the programming connector. After the first project, I had instant access to processor reset and 4 LEDs. When I wanted to test fast things, I hooked up the LED outputs to an ocilliscope. One added benefit, all the debug software was portable across all the boards too.

Another idea I have used is the converse of the 3 digital I/Os to an A/D. We had a board with some analog outputs available, but no digital outputs. No problem, each digital bit can go into the A/D and you can use either a scope or a strip chart recorder to determine what is going on. This does take some work to learn to read though.

Thanks,

– Bob Bailey
Arlington Height, IL


Nice trick with the resistor ladder / task levels, I think I'll use that one on my next design.

Do you have any advice for fending off circling EEs who keep stealing my scope probes? (why would YOU need them etc…)

” Lawrence Collier
Senior Embedded Engineer
Fernau Avionics
United Kingdom


I'm a strong advocate for putting at least a couple of “embeded” LEDs on any of my designs, especially a dual red/green one. Whenever the RTOS is idle, assert the green, anything else, assert red. No extra equipment is required to at-a-glance determine the processor load by the hue of the LED.

I miss the days of computers with a panel full of blinkin' lights.

– Neil Fortney
Design Engineer
ProSoft Technology
Madison, WI


For small microcontroller projects I always reserve a single output pin for an LED output. This pin also doubles as a task status output to an oscilloscope. The task scheduler will pulse this line the # of time corresponding to the Task ID, then set the line HIGH. When the task terminates, the line is set LOW. An oscilloscope gives one a birds-eye view of which task is running, and for how long. The LED brightness variation is often indicative of the “health” of things. I will do a similar thing with the ISR's on anoth pin, if possible. Some might be shocked to know that I've built numerous microcontroller systems with NO other hardware debug capability except the output pin and the 'scope. If you can't afford a hardware emulator, these techniques will tell you what's going on inside. Essentially in real-time.

– Douglas Schmidt
Sr. Development Engineer
Thermo Electron Inc.
Minneapolis, MN


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.