Engineering is all about building predictably reliable systems. But most firmware engineers ignore the role of determinism in real-time systems. Few can answer questions like “how can you guarantee that the system won’t fail when stressed?”
Today’s hardware is often cursed with all sorts of nifty speed-enhancers like cache, pipelines, and speculative execution. All of these contribute to execution time uncertainty. The system’s performance can vary wildly depending on a lot of hard-to-predict events.
An interrupt may occur at any time, and will require at least a partial cache flush. Resuming execution flow means rereading instructions from L2 or memory, which can take a surprisingly long time. A system that is running fine but close to the edge may suddenly crumble in meeting its hard real-time deadlines.
Can you really guarantee the highest priority task will complete on time? What if there's a perfect storm of interrupts? Or of bus activity (DMA or having to yield the bus to another master )? In big systems a task may depend in very complex ways on externalities (other computers, systems, I/O ) that aren't ready in time.
Preemptive multitasking is itself inherently non-deterministic, though techniques like rate-monotonic analysis can mitigate the problem. But RMA requires more analysis than most developers will ever do.
Even extremely simple systems that have none of these speed-enhancing features can suffer from serious timing problems. A little bit of C code that looks quite deterministic probably makes calls to the black hole that is the runtime library, which is generally uncharacterized (in the time domain ) by the vendor. Does that call take a microsecond or a week? No one knows.
It’s my belief that too many systems “work” due only to divine intervention. Developers chase down the usual procedural bugs and then breathe a sigh of relief that, once again, a miracle has occurred.
But all too often that gift from heaven is merely a reprieve, an indulgence, with damnation still possible or even likely when the system experiences unexpected stresses. Or when luck runs out and interrupts bunch up.
Unlike most other engineered systems our real-time devices don’t have fuses that blow when something goes wrong. Instead of a controlled shutdown or fallback to a less-capable mode, firmware completely collapses in an unpredictable way.
What do you do to convince yourself (at least ) that the system will be reliable in the time domain?
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at . His website is .