Failure is an Option - Embedded.com

Failure is an Option

“I know it's male, Dad,” my daughter exclaimed, pointing at the viciously snapping blue crab we had just landed in a bucket. I wanted to teach her how to tell the sex of this so-delicious product of the Chesapeake Bay by examining the shape of the creature's shell. “How do you know that, Kristy?” I asked, my scientific description momentarily preempted. The crab reared back, snapping, angry, and aggressive. “By his attitude,” she replied.

Later, pondering this conversation and her surprisingly deep insight into the male psyche, I thought about a chunk of code I'd recently read. (I know, I know, the peril of being a dweeb is we see everything in terms of our work). It was a pretty typical bit of firmware, peppered with both the usual flashes of brilliance and egregious flaws.

This firmware used quite a bit of dynamic memory management. Malloc()s and free()s abounded. Some developers consider the use of anything other than statics a sin, but it is possible to write really great embedded code with either sort of memory management. But this particular code, like every bit of similar firmware exploiting dynamic memory allocation, was clearly written by a guy. No — there was no clue to the person's sex in the comments, the author's first name nothing but an utterly unrevealing initial. His testosterone came across in the same way exhibited by the crab. Whenever memory was needed, he issued a malloc, and by God, that was a demand from this programmer to get some memory! Gimme some memory, dammit, NOW!

A gentler and more robust approach might be to recognize that malloc has a return code. It might fail. Check the return code and take action if the system is low on resources. Demand memory? No, request it, with a sort of virtual “please”, and recognize that it's not unlikely the system will reply, “well, sorry, there's none available now”.

I read a lot of code. A lot. And I've learned that we developers are an optimistic and an aggressive lot, blithely expecting that the computer will always respond in the way we hope. The program will always run just as we decided it would. Nothing will fail (other than hardware, of course). Code, being deterministic, will always work properly once we pronounce it “done.”

But it just ain't so! I collect embedded disaster stories, and the most common theme that runs through them all is poor or no exception handling. Everything runs fine until something rather simple goes wrong, the exception handler gets called, and (if it exists at all), it's poorly thought out, buggy, or totally inadequate, so the entire system crashes. Exception handlers are our last-ditch chance to save the system. Treated as afterthoughts they're sure to exacerbate the problems.

We've learned that datacomm is unreliable, so we use TCP/IP, which is robust. It'll transfer correct data despite missing and corrupt packets. Yet a friend's autopilot passes data between boxes using a proprietary comm protocol that tolerates no errors. Press a nearby radio's transmit button and the data link gets scrambled, crashing the code. We know that it's a disaster to pass a routine data that's out of range, yet rarely do I see debugging snippets (like assert macros) embedded to catch these common problems; when such constructs do appear they're usually slammed in as acts of panicked desperation. On processors with divide-by-zero traps I nearly never see a handler for this condition, despite the curse such divisions have had on the computer industry for 50 years.

Unexpected stuff happens. Users do astonishingly foolish things. Complex interactions are the norm, not the exception; none of us are smart enough to predict all possible paths.

Why do we persist in writing such fragile systems? When do we accept the fact that failure is normal, weird things happen regularly, and it's our responsibility to catch, process, and safely dispose of such conditions?

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. He founded two companies specializing in embedded systems. Contact him at . His website is .

Take the poll

Reader Feedback

Jack,

Once again, you have hit the nail on the end; adding exception handlers adds robustness. There are hidden benefits to all parts of the organization, too, including:

Engineering: Exception handlers aid in debugging, HW/SW integration and design review.

Marketing: Exception handlers help to ensure MTBF rates and product release dates are met.

Documentation: Technical writers can cut and paste your code to document libraries and routines.

Technical Support: Robust code combined with good documentation reduces customer calls.

Sales: Robust operation and good documentation give your products a competitive advantage.

Beyond the organization, customers win as well. They will reward the timely, robust products with their dollars, referrals and awards. They will punish the failure-prone products with their ire.

Alex Kine
Sales Engineer
General Software

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.