Back to the Basics - Practical Embedded Coding Tips: Part 4 - Embedded.com

Back to the Basics – Practical Embedded Coding Tips: Part 4

My dad was a mechanical engineer, who spent his career designingspacecraft. I remember back even in the early days of the space programhow he and his colleagues analyzed seemingly every aspect of theircreations' behavior. Center of gravity calculations insured that thevehicles were always balanced. Thermal studies guaranteed nothing gottoo hot or too cold.

Detailed structural mode analysis even identified how the systemwould vibrate, to avoid destructive resonances induced by the brutallaunch phase. Though they were creating products that worked in a harshand often unknown environment, their detailed computations profiled howthe systems would behave.

Think about civil engineers. Today no one builds a bridge without”doing the math.” That delicate web of cables supporting a thin dancingroadway is simply going to work. Period. The calculations proved itlong before contractors started pouring concrete.

Airplane designers also use quantitative methods to predictperformance. When was the last time you heard of a new plane designthat wouldn't fly? Yet wing shapes are complex and notoriouslyresistant to analytical methods. In the absence of adequate theory, theengineers rely on extensive tables acquired over decades of wind tunnelexperiments. The engineers can still understand how their product willwork—in general—before bending metal.

Compare this to our field. Despite decades of research, formalmethods to prove software correctness are still impractical for realsystems. We embedded engineers build, then test, with no real proofthat our products will work. When we pick a CPU, clock speed, memorysize, we're betting that our off-the-cuff guesses will be adequatewhen, a year later, we're starting to test 100,000+ lines of code.

Experience plays an important role in getting the resourcerequirements right. All too often luck is even more critical. However,hope is our chief tool, and the knowledge that generally, with enoughheroics, we can overcome most challenges.

In my position as embedded gadfly, looking into thousands ofprojects, I figure some 10″15% are total failures due simply to the useof inadequate resources. The 8051 just can't handle that fire hoseof data. The PowerPC part was a good choice but the program grew totwice the size of available Flash, and with the new cost model theproduct is not viable.

Recently I've been seeing quite a bit written about ways to make ourembedded systems more predictable, to insure they react fast enough toexternal stimuli, to guarantee processes complete on time. To myknowledge there is no realistically useful way to calculatepredictability. In most cases we build the system and start changingstuff if it runs too slowly.

Compared to aerospace and civil engineers we're working in the dark.It's especially hard to predict behavior when asynchronous activitiesalter program flow. Multitasking and interrupts both lead toimpossible-to-analyze problems.

Recent threads on USENET, as well as some discussions at theEmbedded Systems Conference, suggest banning interrupts altogether! Iguess this does lead to a system that's easier to analyze, but thesolution strikes me as far too radical. I've built polled systems.Yech.

Worse are applications that must deal with several different things,more or less concurrently, without using multitasking. The software inboth situations is invariably a convoluted mess.

About 20 years ago I naively built a steel thickness gauge withoutan RTOS, only to later have to shoehorn one in. There were too manyasync things going on, the in-line code grew to outlandish complexity.I'm still trying to figure out how to explain that particular sin toSt. Peter.

A particularly vexing problem is to ensure the system will respondto external inputs in a timely manner. How can we guarantee that aninterrupt will be recognized and processed fast enough to keep thesystem reliable?

Let's look in some detail at the first of the requirements: that aninterrupt be recognized in time. Simple enough, it seems. Page throughthe processor's data book and you'll find a specification called”latency,” a number always listed at submicrosecond levels (Figure9.3 below ). No doubt a footnote defines latency as the longest timebetween when the interrupt occurs and when the CPU suspends the currentprocessing context. That would seem to be the interrupt responsetime—but it ain't.

Figure9.3: The Latency Is the Time from When the Interrupt Signal Appears,Until the ISR Starts

Latency as defined by CPU vendors varies from zero (the processor isready to handle an interrupt RIGHT NOW) to the maximum time specified.It's a product of what sort of instruction is going on. Obviously it'sa bad idea to change contexts in the middle of executing aninstruction, so the processor generally waits until the currentinstruction is complete before sampling the interrupt input.

Now, if it's doing a simple register-to-register move that may beonly a single clock cycle, a mere 50 nsec on a zero wait state 20-MHzprocessor. Not much of a delay at all.

Other instructions are much slower. Multiplies can take dozens ofclocks. Read-modify-write instructions (like “increment memory”) arealso inherently pokey. Maximum latency numbers come from these slowestof instructions.

Many CPUs include looping constructs that can take hundreds, eventhousands, of microseconds. A block memory-to-memory transfer, forinstance, initiated by a single instruction, might run for an awfullylong time, driving latency figures out of sight.

All processors I'm aware of will accept an interrupt in the middleof these long loops to keep interrupt response reasonable. The blockmove will be suspended, but enough context is saved to allow thetransfer to resume when the ISR (Interrupt Service Routine) completes.

Therefore, the latency figure in the datasheet tells us the longesttime the processor can't service interrupts. The number is totallyuseless to firmware engineers. OK, if you're building an extremecycle-countin', nanosecond-poor, gray-hair-inducing system then perhapsthat 300 nsec latency figure is indeed a critical part of your system'sperformance.

For the rest of us, real latency – the 99% component of interruptresponse – comes not from what the CPU is doing, but from our ownsoftware design. And that, my friend, is hard to predict at designtime. Without formal methods we need empirical ways to manage latency.

If latency is time between getting an interrupt and entering theISR, then surely most occurs because we've disabled interrupts! It'sbecause of the way we wrote the darn code. Turn interrupts off for evena few C statements and latency might run to hundreds of microseconds,far more than those handful of nanoseconds quoted by CPU vendors.

No matter how carefully you build the application, you'll be turninginterrupts off frequently. Even code that never issues a “disableinterrupt” instruction does, indeed, disable them often. For, everytime a hardware event issues an interrupt request, the processor itselfdoes an automatic disable, one that stays in effect till you explicitlyre-enable them inside of the ISR. Count on skyrocketing latency as aresult.

Of course, on many processors we don't so much as turn interruptsoff as change priority levels. A 68 K receiving an interrupt on level 5will prohibit all interrupts at this and lower levels until our codeexplicitly re-enables them in the ISR. Higher priority devices willstill function, but latency for all level 1 to 5 devices is infinityuntil the code does its thing.

Therefore, in an ISR re-enable interrupts as soon as possible. Whenreading code one of my “rules of thumb” is that code that does theenable just before the return is probably flawed.

Most of us were taught to defer the interrupt enable until the endof the ISR. But that prolongs latency unacceptably. Every otherinterrupt (at least at or below that priority level) will be shut downuntil the ISR completes. Better, enter the routine, do all of thenonreentrant things (like handling hardware), and then enableinterrupts. Run the rest of the ISR, which manages reentrant variablesand the like, with interrupts on. You'll reduce latency and increasesystem performance.

The downside might be a need for more stack space if that sameinterrupt can re-invoke itself. There's nothing wrong with this in aproperly designed and reentrant ISR, but the stack will grow until allpending interrupts get serviced.

The second biggest cause of latency is excessive use of the disableinterrupts instruction. Shared resources – global variables, hardware,and the like – will cause erratic crashes when two asynchronousactivities try to access them simultaneously.

It's up to us to keep the code reentrant by either keeping all suchaccesses atomic, or by limiting access to a single task at a time. Theclassic approach is to disable interrupts around such accesses. Thougha simple solution, it comes at the cost of increased latency.

Collecting the data you will need
So what is the latency of your system? Do you know? Why not? It'sappalling that so many of us build systems with a “if the stupid thingworks at all, ship it” philosophy. It seems to me there are certaincritical parameters we must understand in order to properly develop andmaintain a product. Like, is there any free ROM space? Is the system20% loaded . . . or 99%? How bad is the maximum latency?

Latency is pretty easy to measure, sometimes those measurements willyield surprising and scary results. Perhaps the easiest way to get afeel for interrupt response is to instrument each ISR with aninstruction that toggles a parallel output bit high when the routinestarts. Drive it low just as it exits. Connect this bit to one input ofan oscilloscope, tying the other input to the interrupt signal itself.

The amount of information this simple setup gives is breathtaking.Measure time from the assertion of the interrupt until the parallel bitgoes high. That's latency, minus a bit for the overhead of managing theinstrumentation bit. Twiddle the scope's time base to measure this toany level of precision required.

The time the bit stays high is the ISR's total execution time. Tiredof guessing how fast your code runs? This is quantitative, cheap, andaccurate. In a real system, interrupts come often. Latency variesdepending on what other things are going on.

Use a digital scope in storage mode. After the assertion of theinterrupt input you'll see a clear space – that's the minimum systemlatency to this input. Then there will be hash, a blur as theinstrumentation bit goes high at different times relative to theinterrupt input. These represent variations in latency. When the blurresolves itself into a solid high, that's the maximum latency.

All this, for the mere cost of one unused parallel bit.

If you've got a spare timer channel, there's another approach thatrequires neither extra bits nor a scope. Build an ISR just formeasurement purposes that services interrupts from the timer.

On initialization, start the timer counting up, programmed tointerrupt when the count overflows. Have it count as fast as possible.Keep the ISR dead simple, with minimal overhead. This is a good thingto write in assembly language to minimize unneeded code. Too many Ccompilers push everything inside interrupt handlers.

The ISR itself reads the timer's count register and sums the numberinto a long variable, perhaps called total_time. Also increment acounter (iterations). Clean up and return.

The trick here is that, although the timer reads zero when it tossesout the overflow interrupt, the timer register continues counting evenas the CPU is busy getting ready to invoke the ISR. If the system isbusy processing another interrupt, or perhaps stuck in aninterrupt-disabled state, the counter continues to increment.

An infinitely fast CPU with no latency would start theinstrumentation ISR with the counter register equal to zero. Realprocessors with more usual latency issues will find the counter at somepositive nonzero value that indicates how long the system was off doingother things.

Therefore, average latency is just the time accumulated intototal_time (normalized to microseconds) divided by the number of timesthe ISR ran (iterations). It's easy to extend the idea to give evenmore information. Possibly the most important thing we can know aboutour interrupts is the longest latency. Add a few lines of code tocompare for and log the maximum time.

Is the method perfect? Of course not. The data is somewhatstatistical, so can miss single-point outlying events. Very speedyprocessors may run so much faster than the timer tick rate that theyalways log latencies of zero, although this may indicate that for allpractical purposes latencies are short enough to not be significant.

The point is that knowledge is power, once we understand themagnitude of latency reasons for missed interrupts become glaringlyapparent.

Try running these experiments on purchased software components. Oneembedded DOS, running on a 100-MHz 486, yielded latencies in the tensof milliseconds!

Next in Part 5: Using yourC-compiler to minimize code size.
To read Part 1 in this series, go to Reentrancy, atomic variables and recursion.
To read Part 2 in this series, go to Asynchronous Hardware/Firmware
To read Part 3, go to MetastableStates

JakobEngblom (jakob@virtutech.com)is technical marketing manager atat Virtutech.He has a MSc in computer science and a PhD in Computer Systems fromUppsala University, and hasworked with programming tools and simulation tools for embedded andreal-time systems since 1997. 


He was a contributor of material to “ The Firmware Handbook,” editedby Jack Ganssle, upon which this series of articles was based andprintedwith permission from Newnes, a division of Elsevier.Copyright 2008.  Forother publications by Jakob Engblom, see www.engbloms.se/jakob.html.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.