Finding defects in code has been the bane of developers' existence
since the earliest days of computer programming. Maurice Wilkes, the
British computer scientist best known for his work on the
EDSAC, said in 1949:
"As soon as we started
programming, we found to our surprise that it wasn't as easy to get
programs right as we had thought. Debugging
had to be discovered. I can remember the exact instant when I
realized that a large part of my life from then on was going to be
spent in finding mistakes in my own programs."
This keen observation from more than 50 years ago still resonates
with anyone tasked with developing software. But why do we make
mistakes? And what are some of the ways that we can avoid making
mistakes in an attempt to diminish the task of debugging software after
it is written? In this paper, we use our years of experience from
developing and commercializing static source code analysis to help
answer these questions.
During this decade, we have analyzed hundreds of millions of lines
of code, seen programming errors from the very simple to the most
complicated and heard first hand accounts of the bugs that killed
development organizations. While it is an impossible task to relate all
of the relevant and interesting anecdotes in this type of discussion,
our aim is to convey the general impression of what mistakes keep
developers and managers awake at night.
| This article is excerpted from a paper of
the same name presented at
the Embedded Systems Conference Boston 2006. Used with permission of
the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/boston/ |
As a means for communicating our experience, we first discuss the
cost of mistakes in software development and hypothesize as to why
developers make mistakes. Then, in an attempt to help developers
identify their most common mistakes as they write their code, we
examine some of the categories of these mistakes, both from a pure
source code perspective as well as from a higher level programming
methodology perspective. Finally, we make the case for automatic
technology to help weed out these mistakes earlier in the development
process.
The cost of software defects
It is a well known fact that software defects are a very costly
problem. According to a study commissioned by the National
Institute of Standards and Technology (NIST), software
errors are costing the U.S. economy an estimated $59.5 billion
annually. The study also reports that more than one-third of these
costs could be eliminated by an improved testing infrastructure that
enables earlier and more effective identification and removal of
software defects.
Drilling into the problem further, it has been shown that the cost
of discovering a defect increases drastically the later it is found in
the development lifecycle. A defect found during the coding phase of a
project is very inexpensive to fix. This makes sense intuitively since
the developer responsible for the defect is working on the questionable
code, has all of the context of that code in his head at the time the
defect is discovered, and as such, can make a reasonable fix in a small
amount of time.
When that same defect slips into the QA or system integration phase
of the development lifecycle, it now can become an order of magnitude
more expensive to address. Now the defect must be discovered as the
program is being executed and the person who discovered the defect must
reproduce the defect and communicate the errant behavior with the
development organization.
Then the development organization must determine which part of the
code was likely to cause that particular fault, assign the appropriate
developer or developers to investigate further to determine the root
cause in the faulty code, then finally fix the defect without
introducing other problems into the code.
Another order of magnitude in cost is added if a defect slips passed
the QA organization and reaches the field. Not only does an
organization have all of the above issues in removing that defect, the
organization must now deal with the additional cost of reproducing the
issue through their support organization, not to mention the cost of
bad public perception surrounding their "buggy product."
Software defects end up costing organizations millions of dollars
every year. But the problem is not because the cost of discovering a
defect in the field is high; it is because organizations are
discovering defects in the field. The distribution of defects across
the development lifecycle (from coding to testing to release) is what
determines the actual cost of those defects to the organization.
If two organizations each have one thousand defects in their code
and the first finds them all in the coding phase but the second
discovers them all after the product has been released, the first
organization is in much better shape financially. Therefore, we must
focus on discovering more defects earlier in the process.
Why do developers make mistakes?
If it's clear to everyone that software defects are an expensive
problems (and we assume that it is), why do developers make mistakes?
Or rather, why do they make as many mistakes as they do to the point
where NIST performs studies and shows that it is costing businesses
sixty billion dollars a year? Based on our experience in developing
software as well as interacting with thousands of software developers
and seeing the types of bugs that come out of the software development
process, we view the following as the top reasons developers make
mistakes.
Ignorance. The reader might
think from this header that we are taking a shot at the educational
system that trains our software developers, but that is not the thrust
of this argument. Developers are ignorant of the systems that they
develop. A single developer can keep thousands, maybe even tens of
thousands of lines of code in his or her head for the purpose of
perfectly understanding how different pieces of the code interact.
However, today's systems are in the hundreds of thousands, if not
millions or tens of millions of lines of code. A single developer
working on that type of system will be calling functions or methods of
which they are quite ignorant. The pieces of the code that he is forced
to interact with may have been written years ago by someone who is no
longer available to explain their intent or nuance. So the developer
does his best, quickly reading though the implementation or the
comments (potentially incorrect!) provided when he needs to interact
with another piece of the system. And this leads to errors.
Stress. We mentioned above
that the developer does his best to "quickly" read through the
implementation of a piece of code that he must interact with. If you
are a developer, you probably didn't think twice about the phrasing of
that sentence (nor did we when writing it) because that is the reality
of any software development process. Managers put pressure on
developers to generate code quickly " deadlines come fast and this
leads to hasty coding and that leads to mistakes. Often these mistakes
are not necessarily in the most common case of the code (since that is
well tested), but on edge cases. When time is of the essence and
developers are stressed, the parts of the code less traversed suffer.
Yet these defects can be just as costly as mainstream bugs.
Boredom. Not all coding is
rocket science. In fact, a good number of coding projects, once the
design is complete, would be classified by most developers as "boring."
Of course, if a developer is bored, he is much less likely to produce
good code than if he is excited about his work.
Pounding out those last few cases in a switch statement when the
first few took dozens of minutes can be just mind-numbing enough to
switch off the brain and make the simplest of mistakes. Boredom also
leads to shortcuts " if you are bored with any given task, you are
probably looking for ways to eliminate your boredom as quickly as
possible. And unfortunately, a shortcut in coding often translates to a
defect in the code.
Human Frailties. Certainly
the above points play into this last point about the very nature of
human beings. Humans are creative and intelligent and able to solve
difficult problems through reason. However, we are not robots. We are
not so good at repeating the exact same operation thousands of times
without some variance. If you doubt this, pull out a piece of paper and
sign your name ten times.
Signing your name is probably something you've done thousands of
times in your life, yet each time is a little different. This variance
means that even if a developer understood every interface in a system
perfectly, had all the time in the world, and were programming the most
interesting project computer science has ever known, he would still
make a mistake in the translation from the design in his head to the
code that he writes. That is just a fact of life.
Common goofs
When discussing common programming defects, we have (at least) two
choices for categorization. We can either categorize based on root
cause in the code (e.g., null
pointer dereference, failure to unlock after acquiring a lock, buffer
overrun, etc.) or based on a higher level reason for the mistake
(e.g., improper error handling,
typo, copy and paste, etc.).
Having a hybrid of these two categorizations is difficult in this
format, so we choose the latter because we feel it gives a better sense
for why a particular defect is introduced. However, we acknowledge that
this higher level categorization is very subjective. We're not here to
forge new territory in defect classification, but rather want to shed
light on why we believe these defects are made.
The examples below are admittedly toy fragments meant only to
highlight the particular issue in the discussion. Bear in mind that
these problems do manifest themselves over hundreds or thousands of
lines of code within and across functions and methods in real systems.
Ignorance. If you were to
ask most developers, "should you
return a pointer into data on the stack?" they would answer a
resounding no. However, from time to time, we see the following type of
code in programs:
The function looks simple enough " it is putting a name into a
character array and then returning that array presumably for the caller
to use. However, once the stack is popped upon return from this
function, that pointer is no longer a reliable piece of data. Once
other functions are called, the data containing that name will be
likely overwritten. To make this function work correctly, we should
allocate the memory dynamically so that it persists past the end of the
function:
Now the caller of the function can trust that the pointer points to
valid data for as long as that memory is not freed. Imagine a potential
caller:
This code will work just fine in printing the name. However, notice
that with the change to the get_name function, we now have introduced a
resource leak in calls_get_name!
If the developer implementing calls_get_name
does not realize that the implementation changed, there is a defect due
to the developer ignorance of that changed interface.
Copy and paste. Now suppose
our developer is tasked with writing a function similar to get_name, but
that instead duplicated the name of an incoming parameter, the
developer would likely copy and paste the original code. Copying and
pasting code is a common practice and often stems from developer
boredom (since the task is not seen as interesting) or from time stress
in not having sufficient time to code a function from scratch. So, the
developer copies get_name
as follows:
And then he changes the name and adds a parameter:
Then he just changes the part that does the strncpy to call strdup
since he knows that's a good way to duplicate a string:
And now the function works as desired. However, the astute reader
notices that in the midst of the copy and pasting, the developer has
left the original call to malloc in the code, thus causing a resource
leak on the very next line when he reassigns the temp_name
pointer:
Error handling. One of the
most common problems we see in code is in the handling of error
conditions. Programmers tend to program for the common case leaving the
outliers, from a path execution standpoint, largely untested. However,
these outliers are exactly the scenario that the end user is likely to
hit as the load becomes high or the application has been running for
days or weeks at a time. Examine the following piece of code, pulled
directly from Linux:

Here a lock is being acquired near the beginning of the function
with the call to spin_lock_irq.
And on the common case, right before the end of the function, the
corresponding unlock function is called. However, notice that there is
an error case in the middle of the function depending on the return
value of vortex_adb_allocroute.
If this function fails, the calling function returns without unlocking
the acquired lock! This can lead to deadlock causing the kernel to
hang. In this particular case, failing to handle the error case
correctly lead to a concurrency type problem, but this bad behavior can
also lead to other coding defects like resource leaks.
Off by ones. Similar to the
case of returning pointers from the stack, if you were to ask a
developer "How do you index arrays in
C/C++ code?" most would appropriately respond that arrays are
0-indexed and the maximum value that should be used to index into array
is the size of the array minus one. However, we still see this type of
code more often than we'd like:
In this case, depending on how the stack is arranged, it is likely
that ptr will be overwritten by the buffer overrun caused by the off by
one error in indexing the array. What's worse, this pointer is now
null, and as such, the caller of the function may inadvertently
deference a null pointer. If you were to catch this type of problem in
testing, it may seem very strange that the pointer is null if you know
that the something_very_important function can never return a null
pointer!
Typos. From time to time, a
developer simply omits some punctuation. Unlike in English, where the
reader can likely "figure out what you meant," a computer will blindly
execute code as is, causing the functionality to be incorrect. In this
example below, the developer clearly meant to break if the element
found in the array was greater than 100. But because he forgot the { and }, the
break will occur on the first iteration of the loop:
And finally, the following typo was discovered in the X.org code that
controls root access in a certain piece of the system:
Notice that the second "call" to geteuid does
not have parenthesis
following the identifier. As such, it is treated as a function pointer
and its value is compared against 0. This test always succeeds allowing
a normal user of the system to have root access when this piece of code
is triggered. Yes, this piece of code is in a real system that tens of
thousands of users are probably still using.
Avoiding the goofs
Unfortunately, we do not have a silver bullet for guaranteeing that
developers will not make some of the common mistakes that lead to very
expensive defects.
There's no way to make code less complex or give them more time to
develop it. However, there is technology that helps alleviate the
problem of human frailties in the software development process.
Research in static source code analysis has made tremendous strides in
the past decade " gone are the false positive ridden days of Lint and other light weight code
scanning tools.
All of the goofs listed in this paper are easily detected by state
of the art static source code analysis technology. Compared with
testing tools (e.g., purify), static source code analysis has the
benefit of analyzing all of the paths through a given code base and is
not tied to the particular test suite of the application. Compared with
manual code audits or developer debugging, static source code analysis
technology isn't hindered by the human frailties discussed previously.
There is no ignorance of the numerous interfaces in the code since
it can analyze the whole program, keeping billions of contexts in
memory simultaneously. Also, static source code analysis never suffers
from stress or boredom or typos. Computers are very good at performing
the same operation thousands of times in a row without variance. If you
want to avoid the most common development goofs, augment your
development process to include the latest technology to help find
defects earlier in the lifecycle.
Ben Chelf is CTO and Founder of Coverity.