Evaluating Function Tests
How much testing is enough? Horgan and Mathur [Hor96] evaluated the coverage of
two well-known programs, TeX and awk. They used functional tests for
these programs that had been developed over several years of extensive
testing. Upon applying those functional tests to the programs, they
obtained the code coverage statistics shown in Figure 5-33 below.
The columns refer to various types of test coverage: block refers to
basic blocks, decision to conditionals, puse to a use of a variable in
a predicate (decision), and c-use to variable use in a nonpredicate
computation. These results are at least suggestive that functional
testing does not fully exercise the code and that techniques that
explicitly generate tests for various pieces of code are necessary to
obtain adequate levels of code coverage.
 |
| Figure
5-33. Code coverage of functional tests for TeX and awk (after Horgan
and Mathur [Hor96]) |
Methodological techniques are important for understanding the
quality of your tests. For example, if you keep track of the number of
bugs tested each day, the data you collect over time should show you
some trends on the number of errors per page of code to expect on the
average, how many bugs are caught by certain kinds of tests, and so on.
One interesting method for analyzing the coverage of your tests is
error injection. First, take your existing code and add bugs to it,
keeping track of where the bugs were added. Then run your existing
tests on the modified program. By counting the number of added bugs
your tests found, you can get an idea of how effective the tests are in
uncovering the bugs you haven't yet found.
This method assumes that you can deliberately inject bugs that are
of similar varieties to those created naturally by programming errors.
If the bugs are too easy or too difficult to find or simply require
different types of tests, then bug injection's results will not be
relevant. Of course, it is essential that you finally use the correct
code, not the code with added bugs.