Scientific debugging: Finding out why your code is buggy – Part 1 -

Scientific debugging: Finding out why your code is buggy – Part 1

In Part 1 of a series excerpted from his book Why Programs Fail ,  author Andreas Zeller explains how to use the scientific method to debug your software.

In software programming and debugging, once we have reproduced and simplified the problem, we must understand how the failure came to be. We can speed up this process by training our reasoning.

How can we systematically find out why a program fails? And how can we do so without vague concepts of “intuition,”“sharp thinking,” and so on? What we want is a method of finding an explanation for the failure—a method that:

  • Does not require a priori knowledge (that is, we need no experience from earlier errors).
  • Works in a systematic and reproducible fashion such that we can be sure to eventually find the cause and reproduce it at will.

The process of obtaining a theory that explains some aspect of the universe is known as scientific method. But it is also the appropriate process for obtaining problem diagnostics. In this series of articles, we will describe the basic techniques for creating and verifying hypotheses, for making experiments, for conducting the process in a systematic fashion, and for making the debugging process explicit.

If a program fails, this behavior is initially just as surprising and inexplicable as any newly discovered aspect of the universe. Having a program fail also means that our abstraction fails. We can no longer rely on our model of the program, but rather must explore the program independently from the model. In other words, we must approach the failing program as if it were a natural phenomenon.

In the natural sciences, there is an established method for developing or examining a theory that explains (and eventually predicts) such an aspect. It is called scientific method because it is supposed to summarize the way (natural) scientists work when establishing some theory about the universe. In this very general form, the scientific method proceeds roughly as follows:

  1. Observe (or have someone else observe) some aspect of the universe.
  2. Invent a tentative description, called a hypothesis, that is consistent with the observation.
  3. Use the hypothesis to make predictions.
  4. Test those predictions by experiments or further observations and modify the hypothesis based on your results.
  5. Repeat steps 3 and 4 until there are no discrepancies between hypothesis and experiment and/or observation.

When all discrepancies are gone, the hypothesis becomes a theory. In popular usage, a theory is just a synonym for a vague guess. For an experimental scientist, though, a theory is a conceptual framework that explains earlier observations and predicts future observations—for instance, a relativity theory or plate tectonics.

In our context, we do not need the scientific method in its full glory, nor do we want to end up with grand unified theories for everything. We should be perfectly happy if we have a specific instance for finding the causes of program failures. In this debugging context, the scientific method operates as follows:

1. Observe a failure (i.e., as described in the problem description).
2. Invent a hypothesis as to the failure cause that is consistent with the observations.
3. Use the hypothesis to make predictions.
4. Test the hypothesis by experiments and further observations:

  • If the experiment satisfies the predictions, refine the hypothesis.
  • If the experiment does not satisfy the predictions, create an alternate hypothesis.

5. Repeat steps 3 and 4 until the hypothesis can no longer be refined.

The entire process is illustrated in Figure 1 . Again,what you eventually get is a theory about how the failure came to be:

  • It explains earlier observations (including the failure).
  • It predicts future observations (for instance,that the failure no longer appears after applying a fix).

In our context, such a theory is called a diagnosis.

Figure 1. The scientific method of debugging.

Applying the scientific method
How is the scientific method used in practice? As an example in this, consider this sample program:

It is supposed to sort its command-line arguments, but some defect causes it to fail under certain circumstances such as:

$ sample 11 14
Output: 0 11
$ _

In Chapter 1 of this book we saw how to find the defect in the sample program—but in a rather ad hoc or unsystematic way. Let’s now retell this debugging story using the concepts of scientific method.Debugging sample—Preparation . We start with writing down theproblem: what happened in the failing run and how it failed to meet ourexpectations. This easily fits within the scientific method scheme bysetting up an initial hypothesis, “the program works,” which is thenrejected. This way,we have observed the failure, which is the first stepin the scientific method.

  • Hypothesis: The sample program works.
  • Prediction: The output of sample 11 14 is “11 14.”
  • Experiment: We run sample as previously.
  • Observation: The output of sample 11 14 is “0 11.”
  • Conclusion: The hypothesis is rejected.

Debugging sample—Hypothesis 1 .We begin with a little verification step: Is the zero value reported bysample caused by a zero value in the program state? Looking at  lines38–41, it should be obvious that the first value printed (0) should be the value of a[0] .It is unlikely that this output code has a defect. Nonetheless, if itdoes we can spend hours and hours on the wrong trail. Therefore, we setup the hypothesis that a[0] is actually zero:

  • Hypothesis: The execution causes a[0] to be zero.
  • Prediction: a[0] = 0 should hold at line 37.
  • Experiment: Using a debugger, observe a[0] at line 37.
  • Observation: a[0] = 0 holds as predicted.
  • Conclusion: The hypothesis is confirmed.

Debugging sample—Hypothesis 2 . Now we must determine where the infection in a[0] comes from. We assume that shell_sort() causes the infection:

  • Hypothesis: The infection does not take place until shell_sort() .
  • Prediction: The state should be sane at the beginning of shell_sort() —that is, a[] = [11, 14] and size = 2 should hold at line 6.
  • Experiment: Observe a[] and size.
  • Observation: We find that a[] = [11, 14, 0] , size = 3 holds.
  • Conclusion: The hypothesis is rejected.

Debugging sample—Hypothesis 3. Assuming we have only one infection site, the infection does not take place within shell_sort() . Instead, shell_sort() gets bad arguments. We assume that these arguments cause the failure:

  • Hypothesis: Invocation of shell_sort() with size = 3 causes the failure.
  • Prediction: If we correct size manually, the run should be successful—the output should be “11 14.”
  • Experiment: Using a debugger, we:

1. Stop execution at shell_sort() (line 6).
2. Set size from 3 to 2 .
3. Resume execution.

  • Observation: As predicted.
  • Conclusion: The hypothesis is confirmed.

Debugging sample—Hypothesis 4 . The value of size can only come from the invocation of shell_sort() in line 36— that is, the argc argument. As argc is the size of the array plus 1, we change the invocation.

  • Hypothesis: Invocation of shell_sort() with size = argc (instead of size = argc – 1 ) causes the failure.
  • Prediction: If we change argc to argc – 1 , the “Changing argc to argc _1” run should be successful. That is, the output should be “11 14 .”
  • Experiment: In line 36, change argc to argc – 1 and recompile.
  • Observation: As predicted.
  • Conclusion: The hypothesis is confirmed.

Afterfour iterations of the scientific method, we have finally refined ourhypothesis to a theory; the diagnosis “Invocation of shell_sort() with argc causes the failure.” We have proven this by showing the two alternatives:

  • With the invocation argc , the failure occurs.
  • With the invocation argc – 1 , the failure no longer occurs.

Thus, we have shown that the invocation with argc caused the failure. As a side effect,we have generated a fix—namely, replacing argc with argc – 1 in line 36. Note that we have not yet shown that the change inducescorrectness—that is, sample may still contain other defects.

Inparticular, in programs more complex than sample we would now have tovalidate that this fix does not introduce new problems. In the case ofsample, though, you can do such a validation by referring to a higherauthority: as the author, I claim that with the fix applied there is noway sample could ever sort incorrectly. Take my word for it.

Scientific debugging is explicit debugging
Earlier,we saw how to use the scientific method to establish the failure cause.You may have noticed that the process steps were quite explicit: Weexplicitly stated the hypotheses we were examining, and we explicitlyset up experiments that supported or rejected the hypotheses.

Beingexplicit is an important means toward understanding the problem athand, starting with the problem statement. Every time you encounter aproblem, write it down or tell it to a friend. Just stating the problemin whatever way makes you rethink your assumptions—and often reveals theessential clues to the solution. The following is an amusingimplementation, as reported by Kernighan and Pike (1999) [1]:

Oneuniversity center kept a Teddy bear near the help desk. Students withmysterious bugs were required to explain them to the bear before theycould speak to a human counselor.

Unfortunately, mostprogrammers are implicit about the problem statement, and even more sowithin the debugging process (they keep everything in their minds). Butthis is a dangerous thing to do. As an analogy, consider aMastermind-game (Figure 2 ).

Figure 2. A mathematical game

Youropponent has chosen a secret code, and you have a number of guesses.For each guess, your opponent tells you the number of tokens in yourguess that had the right color or were in the right position. If youhave ever played Mastermind and won, you have probably applied thescientific method.

However, as you may recall from yourMastermind experiences, you must remember all earlier experiments andtheir outcomes, in that this way you can keep track of all confirmed andrejected hypotheses. In a Mastermind game, this is easy, as the guessesand their outcomes are recorded on the board.

In debugging,though, many programmers do not explicitly keep track of experiments andoutcomes, which is equivalent to playing Mastermind in memory. In fact,forcing yourself to remember all experiments and outcomes prevents youfrom going to sleep until the bug is eventually fixed. When debuggingthis way, a “mastermind” is not enough—you also need a “master memory.”

Keeping a logbook
Astraightforward way of making debugging explicit and relieving memorystress is to write down all hypotheses and observations—that is, keep alogbook. Such a logbook can be either on paper or in some electronicform.

Keeping a logbook may appear cumbersome at first, but witha well-kept logbook you do not have to keep all experiments andoutcomes in memory. You can always quit work and resume the next day. InZen and the Art of Motorcycle Maintenance , Robert M. Pirsig writes about the virtue of a logbook in cycle maintenance:

Everythinggets written down, formally, so that you know at all times where youare, where you’ve been, where you’re going, and where you want to get.In scientific work and electronics technology this is necessary becauseotherwise the problems get so complex you get lost in them and confusedand forget what you know and what you don’t know and have to give up.

Andbeware—this quote applies to motorcycle maintenance. Real programs aretypically much more complex than motorcycles. For a motorcyclemaintainer, it would probably appear amazing that people would debugprograms without keeping logbooks.

And how should a logbook bekept? Unless you want to share your logbook with someone else, feel freeto use any format you like. However, your notes should include thefollowing points, as applied earlier in this article.

  • Statement of the problem (a problem report, or, easier, a report identifier)
  • Hypotheses as to the cause of the problem
  • Predictions of the hypotheses
  • Experiments designed to test the predictions
  • Observed results of the experiments
  • Conclusions from the results of the experiments

An example of such a logbook is shown in Figure 3 ,recapitulating hypotheses 2 and 3 earlier in this article. Again,quoting Robert Pirsig: This is similar to the formal arrangement of manycollege and high-school lab notebooks, but the purpose here is nolonger just busywork. The purpose now is precise guidance of thoughtsthat will fail if they are not accurate.

Figure 3. A debugging logbook (excerpt)

Scientific versus quick-and-dirty debugging
Notevery problem needs the full strength of the scientific method or theformal content of a logbook. Simple problems should be solved in asimple manner— without going through the explicit process. If we find aproblem we suppose to be simple, the gambler in us will head for thelighter process. Why bother with formalities? Just think hard and solvethe problem.

The problem with such an implicit “quick-and-dirty”process is to know when to use it. It is not always easy to tell inadvance whether a problem is simple or not. Therefore, it is useful toset up a time limit. If after 10 minutes of quick-and-dirty debuggingyou still have not found the defect, go for the scientific methodinstead and write down the problem statement in the logbook. Then,straighten out your head by making everything formal and exact—and feelfree to take a break whenever necessary.

Part 2: Algorithmic debugging and reasoning about programs

1. Kerningham, B.W. and Pike, R.(1999). The Practice of Programming . Addison Wesley.
2. Nethercote, N. (2004), Dynamic binary analysis and instrumentation , PhD. Thesis, University of Cambridge, U.K.

Andreas Zeller is chair of software engineering at the University of Saarland wherehis research involves programmer productivity with a particular interestin finding and fixing problems in code and development processes. He isbest known for the visual GNU DDD debugger and the delta debuggingtechnique for automatically isolating failure causes in program code.

This article was excerpted from Andreas Zeller’s book Why programs fail: A guide to systematic debugging (Second Edition, Copyright 2009), published by Morgan Kauffmann, an Imprint of Elsevier Inc.

5 thoughts on “Scientific debugging: Finding out why your code is buggy – Part 1

  1. I have to admit i started reading this as challenging if something new would come out of it. I immediately realized that I've been applying this “scientific” approach to solve bugs. I'ts natural and if you've done programming for some time you eventually w

    Log in to Reply
  2. I have to admit i started reading this as challenging if something new would come out of it. I immediately realized that I've been applying this “scientific” approach to solve bugs. I'ts natural and if you've done programming for some time you eventually w

    Log in to Reply
  3. I have to admit i started reading this as challenging if something new would come out of it. I immediately realized that I've been applying this “scientific” approach to solve bugs. I'ts natural and if you've done programming for some time you eventually w

    Log in to Reply
  4. I have to admit i started reading this as challenging if something new would come out of it. I immediately realized that I've been applying this “scientific” approach to solve bugs. I'ts natural and if you've done programming for some time you eventually w

    Log in to Reply
  5. The author made an excellent well thought out example. But unless there is some further discussion that had been left out (since this was an excerpt), the predictive aspect of applying the scientific method is not shown. This is truly the hard part of the

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.