"Toyota's Killer Firmware" and the "Single Bit Flip That Killed"? Not! - Embedded.com

“Toyota’s Killer Firmware” and the “Single Bit Flip That Killed”? Not!

As some readers may remember, in 2013 an Oklahoma jury found that Toyota's embedded software was to blame for unintended acceleration that resulted in a fatal accident. The embedded software expert who convinced the jury that software was to blame made his trial testimony and slides available to the public and invited us to “judge for ourselves.” A flurry of articles in technical publications followed, most of which (to the best of my recollection) described the expert's testimony in favorable terms, accepting the conclusion that software caused the accident. Many of these articles had attention-grabbing titles such as Toyota's killer firmware and Toyota Case: Single Bit Flip That Killed.

Having developed embedded software for many years, and having written an opinion piece in the Los Angeles Times years before the trial about the possibility that Toyota's software was to blame for reported incidents of unintended acceleration, I read the expert's testimony with great interest and excitement. I was expecting to find a convincing argument based on the evidence that the software was indeed to blame for the accident in question.

As I delved deeper and deeper into the testimony, however, my excitement turned to disappointment. It became clear that there was no credible theory based on the evidence. I felt it was important to set the record straight, and so I submitted an article with my technical analysis to the IEEE Technology and Society Magazine . That article was peer reviewed and then accepted for publication in their most recent issue (December, 2016). (You can also find a pre-publication version of the article on my company's website, as well as two video interviews here and here providing additional context.)

In this column, I summarize some of my findings (please refer to the IEEE article for a more complete discussion that includes all the technical details).

As discussed in the IEEE article, the plaintiffs convinced the jury that Toyota's embedded software was responsible for the accident by employing the following approach:

  1. First, they bombarded the non-technical jury with criticisms of the quality of Toyota's software from two different software experts. The first expert did not see any of Toyota's source code, but nonetheless his entire testimony, which is also publicly available (see Part 1 and Part 2), was directed toward criticizing the quality of Toyota's software. As anyone with extensive experience developing real-world software knows, software quality assessments can be highly subjective.
  2. The second of the two experts, who did examine Toyota's source code, also criticized the quality of Toyota's software. Then he told the jury that “to a reasonable degree of engineering certainty, it was more likely than not” that the death of a task running on the engine control processor (referred to at trial as “Task X”) was responsible for the accident, despite the fact that the evidence presented at trial did not support that conclusion.

According to the testimony of the second expert, Task X is a periodic task that executes multiple times per second. One of its many responsibilities is to determine the correct throttle angle setting (how far open the throttle should be) based on how hard the driver is pressing on the accelerator pedal (as well as other factors). Therefore, multiple times per second, Task X wakes up and, among other things, determines the current accelerator pedal position and sets a throttle angle variable accordingly. The throttle angle variable is then used by another part of the software to set the throttle to the angle specified in that variable.

The expert identified a specific bit within an operating system data structure on the engine control processor (the main CPU) that determined whether Task X was alive (schedulable) or dead. According to the expert, this data structure (as well as the throttle angle variable) was not protected by software techniques such as mirroring, or by hardware techniques such as error detection and correction. Therefore, if this data structure became corrupted, the corruption would not be detected or corrected. The expert theorized that the bit for Task X in this data structure was erroneously flipped from one to zero due to a software bug or single event upset (SEU), and that this caused the accident, as described below.

Another key element of the expert's accident theory is a fail-safe called the “Brake Echo Check,” which is software that runs on a second processor called the monitor CPU. The Brake Echo Check is designed to behave as follows: If Task X died, and if the driver then stepped on the brake or released the brake, then about 200 milliseconds later the Brake Echo Check on the monitor CPU would detect an inconsistency resulting from the death of Task X on the main CPU, and would force the throttle to idle. About three seconds later it would stall the engine. When the throttle is at idle, braking will successfully stop the vehicle.

According to the accident theory presented to the jury by the expert, the following three things had to happen together just prior to the accident:

  1. The bit corresponding to Task X in the operating system data structure was somehow flipped from one to zero, resulting in the death of Task X.
  2. At the time of this bit flip, the throttle angle variable maintained by Task X contained a large value corresponding to an open throttle. Because Task X never ran again, the throttle angle variable was stuck at this value and the throttle remained open.
  3. The Brake Echo Check did not work for some reason. When the driver stepped on the brake, the Brake Echo Check did not correctly detect the inconsistency due to the death of Task X, and therefore it did not force the throttle to idle. Because the throttle remained open, the driver was unable to stop the vehicle by braking.

This theory is not credible as the likely explanation for the accident for at least the following reasons:

  1. It requires two nearly simultaneous independent failures — the hypothetical bit flip and the hypothetical failure of the Brake Echo Check — on two different processors.
  2. The expert provided no evidence that either failure occurred at the time of the accident or under any circumstances.
  3. For the hypothetical bit flip, the expert merely speculated that it might possibly occur under some circumstances due to problems he claimed to have identified in the software. No connection was established between any of those claimed problems and the specific bit in question (more details are provided in the IEEE article).
  4. For the hypothetical failure of the Brake Echo Check, the expert did not even speculate at trial why the Brake Echo Check would fail under any circumstances. Furthermore, all of the testing of the Brake Echo Check that he presented at trial showed it working exactly as designed. He also said that if the driver's foot was already on the brake when the hypothetical bit flip caused Task X to die, then the Brake Echo Check would not act to close the throttle because it only acts if there is a brake transition (brake on or brake off). As I show in my IEEE article, however, this is irrelevant because if the driver's foot was already on the brake when the hypothetical bit flip occurred, then the throttle would already be at idle and normal braking would stop the vehicle. The Brake Echo Check would not be needed.

The expert also presented an alternative theory involving the death of Task X that did not assume that the throttle angle variable contained a large value at the time of the hypothetical bit flip. As I show in the IEEE article, this alternative theory is also not credible as the likely explanation for the accident because it requires at least two hypothetical memory corruptions in two different parts of memory (a corruption of the operating system bit plus a corruption of the throttle angle variable) without any supporting evidence. In fact, in many scenarios, it requires yet a third simultaneous failure — a failure of the Brake Echo Check, as in the first theory.

Why should all of this be important to the embedded systems community? There are at least two reasons. First, the plaintiffs in this trial appear to have hit on an approach for embedded software trials that can produce a favorable verdict for the plaintiffs even if the evidence and technical analysis do not support such a verdict. Given its success in this trial, it seems likely that plaintiffs in future embedded software trials will employ the same approach. Hopefully, through increased awareness of this issue by the embedded systems community, the verdicts in future embedded software trials will more likely be supported by the evidence than was the case in this trial. Consequently, justice will be better served in future trials than it was in this trial.

The second reason is that, in this era of science deniers, it is more important than ever that we in the engineering and scientific communities be extremely vigilant and scrupulous in all of our publicly-expressed engineering or scientific opinions, lest those opinions become fodder for the deniers in their attempts to discredit science and scientists. By presenting engineering or scientific opinions at trial that are not supported by the evidence or by technical analysis, we run the risk of unwittingly providing ammunition for the science deniers.

Dr. David M. Cummings is the Executive Vice President of the Kelly Technology Group in Santa Barbara, CA. He has over 35 years of experience in the design and implementation of software systems, many of which are embedded systems. Nine of those years were spent at the Jet Propulsion Laboratory, where he designed and implemented flight software for the Mars Pathfinder spacecraft. He holds a bachelor's degree from Harvard University, and a master's degree and a Ph.D. from UCLA.

16 thoughts on ““Toyota’s Killer Firmware” and the “Single Bit Flip That Killed”? Not!

  1. “I doubt we'll ever really know what went wrong. Given that cars carry many more people than aircraft, the use of some kind of black box drive recorder is warranted (does that cost more than mechanical linkages, car makers??)nnAnd two standard rules of

    Log in to Reply
  2. “I agree that we are not likely to ever know with certainty what went wrong in this particular accident. With that said, a fundamental issue for many automotive product liability trials ends up being driver error versus vehicle malfunction. There are u201

    Log in to Reply
  3. “Your article was excellent until I got to the last paragraph and this statement. “The second reason is that, in this era of science deniers…”. What is your definition of a science denier? I'm making an assumption here that you are referring to peop

    Log in to Reply
  4. “Almost any technological tragedy today requires a confluence of unlikely events. Given the large number of cars sold, and the large number of hours driven per year, a few dozen accidents where multiple events occur are practically a certainty.nnSafety

    Log in to Reply
  5. “Regarding “Science Deniers”. Science is meant to be questioned. Science is meant to be scrutinized. Otherwise, your science is no different than religion.”

    Log in to Reply
  6. “When the scientists who do the questioning are scorned, professionally disgraced and have their careers destroyed by the other “scientists” or political elite to further their globalist religion you think that is perfectly fine? “

    Log in to Reply
  7. “The fundamental challenge I see here is that several well-known embedded systems experts based their testimony on direct examination of the code, but you must rely on their trial testimony, which disclosed little about the code, other than titillating fac

    Log in to Reply
  8. “Arguments based on “code quality” can be highly subjective. I have seen buggy code beautifully formatted, and I have seen exceptionally performing code that appear messy in the first glance only to reveal its consistency upon closer look. Now, when I he

    Log in to Reply
  9. “These are important points. Thank you. Due to space limitations, I focused my IEEE article on the plaintiffsu2019 flawed causation theory. But I have also examined the plaintiffsu2019 testimony on Toyotau2019s software quality, and I have serious conce

    Log in to Reply
  10. “You make an interesting point. In response, Iu2019d like to first point out that only one of the two embedded systems experts based his testimony on direct examination of the code. The first expert to testify did not see any of Toyotau2019s source code

    Log in to Reply
  11. “Thank you for your comment. Although coding standards serve a useful purpose (and I have used them on many projects), I agree that just because code adheres to a coding standard doesnu2019t mean the code is high quality. As you suggest, one cannot meanin

    Log in to Reply
  12. “My Audi A4 once engaged in uncontrolled acceleration. I have been driving for 4 decades, every type of vehicle. I was not “confused” between the brake and the accelerator. My car accelerated in an uncontrolled fashion all by itself. Because it is contro

    Log in to Reply
  13. “I am not saying that I donu2019t believe a car computer could malfunction. But the fact that a malfunction could occur is not the point. The point is that the plaintiffs didnu2019t show that it is more likely than not that a computer malfunction caused

    Log in to Reply
  14. “Sure, everyone's a climate scientist these days, even cowboys. Heck, you can be a failed politician and still become the pope of the Religion of Climate Control.”

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.