"Toyota's Killer Firmware" and the "Single Bit Flip That Killed"? Not!
As some readers may remember, in 2013 an Oklahoma jury found that Toyota's embedded software was to blame for unintended acceleration that resulted in a fatal accident. The embedded software expert who convinced the jury that software was to blame made his trial testimony and slides available to the public and invited us to "judge for ourselves." A flurry of articles in technical publications followed, most of which (to the best of my recollection) described the expert's testimony in favorable terms, accepting the conclusion that software caused the accident. Many of these articles had attention-grabbing titles such as Toyota's killer firmware and Toyota Case: Single Bit Flip That Killed.
Having developed embedded software for many years, and having written an opinion piece in the Los Angeles Times years before the trial about the possibility that Toyota's software was to blame for reported incidents of unintended acceleration, I read the expert's testimony with great interest and excitement. I was expecting to find a convincing argument based on the evidence that the software was indeed to blame for the accident in question.
As I delved deeper and deeper into the testimony, however, my excitement turned to disappointment. It became clear that there was no credible theory based on the evidence. I felt it was important to set the record straight, and so I submitted an article with my technical analysis to the IEEE Technology and Society Magazine. That article was peer reviewed and then accepted for publication in their most recent issue (December, 2016). (You can also find a pre-publication version of the article on my company's website, as well as two video interviews here and here providing additional context.)
In this column, I summarize some of my findings (please refer to the IEEE article for a more complete discussion that includes all the technical details).
As discussed in the IEEE article, the plaintiffs convinced the jury that Toyota's embedded software was responsible for the accident by employing the following approach:
- First, they bombarded the non-technical jury with criticisms of the quality of Toyota's software from two different software experts. The first expert did not see any of Toyota's source code, but nonetheless his entire testimony, which is also publicly available (see Part 1 and Part 2), was directed toward criticizing the quality of Toyota's software. As anyone with extensive experience developing real-world software knows, software quality assessments can be highly subjective.
- The second of the two experts, who did examine Toyota's source code, also criticized the quality of Toyota's software. Then he told the jury that "to a reasonable degree of engineering certainty, it was more likely than not" that the death of a task running on the engine control processor (referred to at trial as "Task X") was responsible for the accident, despite the fact that the evidence presented at trial did not support that conclusion.
According to the testimony of the second expert, Task X is a periodic task that executes multiple times per second. One of its many responsibilities is to determine the correct throttle angle setting (how far open the throttle should be) based on how hard the driver is pressing on the accelerator pedal (as well as other factors). Therefore, multiple times per second, Task X wakes up and, among other things, determines the current accelerator pedal position and sets a throttle angle variable accordingly. The throttle angle variable is then used by another part of the software to set the throttle to the angle specified in that variable.
The expert identified a specific bit within an operating system data structure on the engine control processor (the main CPU) that determined whether Task X was alive (schedulable) or dead. According to the expert, this data structure (as well as the throttle angle variable) was not protected by software techniques such as mirroring, or by hardware techniques such as error detection and correction. Therefore, if this data structure became corrupted, the corruption would not be detected or corrected. The expert theorized that the bit for Task X in this data structure was erroneously flipped from one to zero due to a software bug or single event upset (SEU), and that this caused the accident, as described below.
Another key element of the expert's accident theory is a fail-safe called the "Brake Echo Check," which is software that runs on a second processor called the monitor CPU. The Brake Echo Check is designed to behave as follows: If Task X died, and if the driver then stepped on the brake or released the brake, then about 200 milliseconds later the Brake Echo Check on the monitor CPU would detect an inconsistency resulting from the death of Task X on the main CPU, and would force the throttle to idle. About three seconds later it would stall the engine. When the throttle is at idle, braking will successfully stop the vehicle.
According to the accident theory presented to the jury by the expert, the following three things had to happen together just prior to the accident:
- The bit corresponding to Task X in the operating system data structure was somehow flipped from one to zero, resulting in the death of Task X.
- At the time of this bit flip, the throttle angle variable maintained by Task X contained a large value corresponding to an open throttle. Because Task X never ran again, the throttle angle variable was stuck at this value and the throttle remained open.
- The Brake Echo Check did not work for some reason. When the driver stepped on the brake, the Brake Echo Check did not correctly detect the inconsistency due to the death of Task X, and therefore it did not force the throttle to idle. Because the throttle remained open, the driver was unable to stop the vehicle by braking.
This theory is not credible as the likely explanation for the accident for at least the following reasons:
- It requires two nearly simultaneous independent failures -- the hypothetical bit flip and the hypothetical failure of the Brake Echo Check -- on two different processors.
- The expert provided no evidence that either failure occurred at the time of the accident or under any circumstances.
- For the hypothetical bit flip, the expert merely speculated that it might possibly occur under some circumstances due to problems he claimed to have identified in the software. No connection was established between any of those claimed problems and the specific bit in question (more details are provided in the IEEE article).
- For the hypothetical failure of the Brake Echo Check, the expert did not even speculate at trial why the Brake Echo Check would fail under any circumstances. Furthermore, all of the testing of the Brake Echo Check that he presented at trial showed it working exactly as designed. He also said that if the driver's foot was already on the brake when the hypothetical bit flip caused Task X to die, then the Brake Echo Check would not act to close the throttle because it only acts if there is a brake transition (brake on or brake off). As I show in my IEEE article, however, this is irrelevant because if the driver's foot was already on the brake when the hypothetical bit flip occurred, then the throttle would already be at idle and normal braking would stop the vehicle. The Brake Echo Check would not be needed.
The expert also presented an alternative theory involving the death of Task X that did not assume that the throttle angle variable contained a large value at the time of the hypothetical bit flip. As I show in the IEEE article, this alternative theory is also not credible as the likely explanation for the accident because it requires at least two hypothetical memory corruptions in two different parts of memory (a corruption of the operating system bit plus a corruption of the throttle angle variable) without any supporting evidence. In fact, in many scenarios, it requires yet a third simultaneous failure -- a failure of the Brake Echo Check, as in the first theory.
Why should all of this be important to the embedded systems community? There are at least two reasons. First, the plaintiffs in this trial appear to have hit on an approach for embedded software trials that can produce a favorable verdict for the plaintiffs even if the evidence and technical analysis do not support such a verdict. Given its success in this trial, it seems likely that plaintiffs in future embedded software trials will employ the same approach. Hopefully, through increased awareness of this issue by the embedded systems community, the verdicts in future embedded software trials will more likely be supported by the evidence than was the case in this trial. Consequently, justice will be better served in future trials than it was in this trial.
The second reason is that, in this era of science deniers, it is more important than ever that we in the engineering and scientific communities be extremely vigilant and scrupulous in all of our publicly-expressed engineering or scientific opinions, lest those opinions become fodder for the deniers in their attempts to discredit science and scientists. By presenting engineering or scientific opinions at trial that are not supported by the evidence or by technical analysis, we run the risk of unwittingly providing ammunition for the science deniers.
Dr. David M. Cummings is the Executive Vice President of the Kelly Technology Group in Santa Barbara, CA. He has over 35 years of experience in the design and implementation of software systems, many of which are embedded systems. Nine of those years were spent at the Jet Propulsion Laboratory, where he designed and implemented flight software for the Mars Pathfinder spacecraft. He holds a bachelor's degree from Harvard University, and a master's degree and a Ph.D. from UCLA.