Building reliable and secure embedded systems
In this era of 140 characters or less, it has been well and concisely stated that, "reliability concerns accidental errors causing failures, whereas security concerns intentional errors causing failures." In this column, I expand on this statement, especially as regards the design of embedded systems and their place in our network-connected and safety-conscious modern world.
Given time, tightening cycles of debug and test can get us past the bugs and through to a shippable product. But is a debugged system good enough? Neither reliability nor security can be tested into a product. Each must be designed in from the start. So let's take a closer look at these two important design aspects for modern embedded systems and then I'll bring them back together at the end.
Reliable embedded systems
A product can be stable yet lack reliability. Consider, for example, an anti-lock braking computer installed in a car. The software in the anti-lock brakes may be bug-free, but how does it function if a critical input sensor fails?
Reliable systems are robust in the face of adverse run-time environments. Reliable systems are able to work around errors encountered as they occur to the system in the field--so that the number and impact of failures are minimized. One key strategy for building reliable systems is to eliminate single points of failure. For example, redundancy could be added around that critical input sensor--perhaps by adding a second sensor in parallel with the first.
Another aspect of reliability that is under the complete control of designers (at least when they consider it from the start) are the "fail-safe" mechanisms. Perhaps a suitable but lower-cost alternative to a redundant sensor is detection of the failed sensor with a fall back to mechanical braking.
Failure Mode and Effect Analysis (FMEA) is one of the most effective and important design processes used by engineers serious about designing reliability into their systems. Following this process, each possible failure point is traced from the root failure outward to its effects. In an FMEA, numerical weights can be applied to the likelihoods of each failure as well as the seriousness of consequences. An FMEA can thus help guide you to a cost effective but higher reliability design by highlighting the most valuable places to insert the redundancy, fail-safes, or other elements that reinforce the system's overall reliability.
In certain industries, reliability is a key driver of product safety. And that is why you see these techniques, FMEA, and other design-for-reliability processes being applied by the designers of safety-critical automotive, medical, avionics, nuclear, and industrial systems. The same techniques can, of course, be used to make any type of embedded system more reliable.
Regardless of your industry, it is typically difficult or impossible to make your product as reliable via patches. There's no way to add hardware like that redundant sensor, so your options may reduce to a fail-safe that is helpful but less reliable overall. Reliability cannot be patched or tested or debugged into your system. Rather, reliability must be designed in from the start.