The lawyers are coming!
Total recall
Lest you think that the evidence from the presentation are exceptions to the norm found because I and other engineers were on the prowl for bad code, consider just a couple of examples stemming from the more obvious embedded software failures.
First, recall the Patriot Missile failure in Dhahran, Saudi Arabia during the first Gulf War. Twenty-eight U.S. soldiers were killed when a Scud missile was not shot down due to improper tracking by the Patriot Missile battery protecting a military base. A report from the U.S. Government Accountability Office examined the events leading to the failure and concluded the problem was partly in the requirements: the government didn't tell the designer it would need to "operate continuously for long periods of time." Huh!? "At the time of the incident, the battery had been operating continuously for over 100 hours".5, 6
Now consider a more recent example. GPS-maker Garmin announced a "free, mandatory GPS software update to correct a software issue that has been discovered to cause select GPS devices to repeatedly attempt to update GPS firmware and then either shut down or no longer acquire GPS satellite signals." This sounds to me like a bug in their bootstrap loader (a.k.a., bootloader). Many Garmin GPS units are named as affected, including members of the popular nüvi product family.7
Or consider what a consumer had to say about his Celestron SkyScout Personal Planetarium recently in a forum at Amazon.com: "I'm downloading the second firmware update release since I've had my SkyScout . . . about 3 weeks. Each release is making the device more stable."
Finally, consider these quotes from the recent recall of a device regulated by the U.S. Food and Drug Administration--an AED (automatic external defibrillator):
• "Units serviced in 2007 and upgraded with software version 02.06.00 have a remote possibility of shut down during use in cold environmental conditions. There are no known injuries or deaths associated with this issue. The units will be updated with the current version of software."
• "All of the recalled units will be upgraded with software that corrects [another] unexpected shutdown problem. In the meantime . . . it is vital to follow the step 1-2-3 operating procedure which directs attachment of the pads after the device has been turned on. This procedure is described on the back of your device and also in the Quick Reference material inside the AED 10 case. Some pages in the user's manual may erroneously describe or show illustrations of [a different] operating procedure . . . Please disregard these erroneous instructions."
At least one death was reported at a time when the second type of unexpected software shutdown occurred. Are bugs in the embedded software to blame for that too? If not, how did the User's Manual come to be out of sync with the firmware in a process-driven FDA-regulated environment?
Given the above, is it not appropriate to wonder if the unexplained loss of Air France 447 over the Atlantic Ocean earlier this year was firmware-related? An abrupt 650-ft. dive an Airbus A330 flight experienced in October 2006 may offer clues to the loss of Air France 447. Authorities have blamed a pair of simultaneous computer failures for that event in the fly-by-wire A330. First, one of three redundant air data inertial reference units began giving bad data. Then, a voting algorithm intended to handle precisely such a failure in one unit by relying only on the other two failed to work as designed; the flight computer instead made decisions only on the basis of the one failed unit! "More than 100 of the 300 people on board were hurt, with broken bones, neck and spinal injuries, and severe lacerations splattering blood throughout the cabin."8 A lawsuit is pending.


Loading comments... Write a comment