Again last week, we’re awakened to the news of a new software “glitch”, this time a security problem, called “Heartbleed ”, described as the worst security problem ever. Since taking advantage of that hole leaves no trace and there’s no way to know what information was obtained, we may never know its full impact. There’s been lots of discussion of the details of this bug and its possible consequences, so I want to focus on a more basic issue: why are these things still happening?
Just over a month ago, another bug in the same type of security software affected Apple devices. Two years ago, a “glitch” caused a Wall Street trading firm to lose $440M in 30 minutes, likely the most expensive bug ever. Nearly every week we hear about some “glitch”, perhaps shutting down an airline, hotel reservation system, or stock trading for a few hours. And now cars are being recalled due to defective software.
Computers have been around for 70 years. Programming languages like those in use today have been around for 60. In those decades, computers have gotten amazingly smaller and faster. And they require increasingly complex software to function correctly. But are we getting any better at writing such software? Clearly not.
The programming language used in the software responsible for this week’s news is 40 years old. The tools used to write it are much the same as would have been used 40 years ago. The same language and tools were used in the recent Apple bug. Analysis of that bug showed that using any one of multiple practices that should be standard in the industry would have prevented that bug. What about this one?
Put simply, what happened here is that the software, used by almost every web site, can be tricked to send lots of data that happened to be in its memory from previous transactions. That information may be innocuous junk or it may contain usernames and passwords or cryptographic keys: there’s no way to know.
The aviation industry has experience designing reliable software. There’s never been a death resulting from a software failure in an airplane. Why is this? Because there’s a standard that software for a passenger airplane must meet that entails an enormous amount of verification, and testing in particular. Testing such software costs far more than writing it. It involves rigorous processes where one constructs a large set of requirements, and, among other things, verifies there’s a test that the software meets each requirement and that each piece of the software implements some requirement. One might view the costs of performing this procedure on other software as prohibitively expensive, but tell that to Knight Capital, who lost $440M due to a bug.
Although some of the same best practices that would have caught the Apple bug would also have caught this one, testing and following best practices aren’t enough for software that provides security services. That software isn’t merely designed to perform a certain task (such as controlling an airplane engine) where you can test that it performs that task correctly, but rather needs to prevent certain bad things.
The only way to be sure those bad things can’t happen is to prove, mathematically, that they can’t. A very simple example of something you can prove is that the sum of any two odd numbers must be even. The fact that we can prove it means we don’t have to try every two possible odd numbers to be convinced of the truth of that statement. We can do similar things with software we write.
Technology is far from being able to prove complete programs correct, for one thing because we usually can’t state in a mathematical form what “correct” means. But technology does exist now that can be used to prove that programs meet certain properties. A critical one is that the program never reads from a place in memory where it didn’t write.
The program that contained the Heartbleed bug did exactly that and an attempt to prove that it didn’t would have quickly found this bug (as would the use of certain tools that also detect this type of error). But we can do more: programs providing security services should never transmit private keys, usernames, and passwords externally and that’s a property we should also be able to prove, as well as the conditions under which they grant access to secured services.
Proofs are not only valuable in the security arena. Wouldn’t you be happier driving a car where it was proven that the throttle would always be closed when you had your foot on the brake than one where that couldn’t be proven? Over the last few decades, the computer industry has developed tools and best practices that could, if followed, have prevented the most serious of these “glitches”. We should no longer accept these problems as inevitable.
Richard Kenner is a co-founder and Vice President of AdaCore. He was a researcher in the Computer Science Department at New York University from 1975 through 1998. During that time, he worked with the SETL Project, a research project in high-level programming languages, the PUMA Project, which designed and built an emulator for the CDC 6600 in the late 1970's, and for many years with the Ultracomputer Project, which did research on highly-parallel computer systems. At Ultra, he worked on hardware design, VLSI design, and compilers. As part of the latter work, he was the maintainer of the GCC compiler (on which the GNAT technology is based) for the Free Software Foundation for the 2.5.x through 2.8.x releases. He published papers both on compiler design and VLSI design of a switching network for Ultra and was the designer of the interface between the GNAT front end and GCC.