Fortunately, there are many available techniques for software testing that can help us generate a comprehensive set of tests to ensure that our system works properly. In this last part in this series, we examine the role of validation in the overall design methodology as well as the nuts-and-bolts techniques for creating a good set of tests for a given program.
The first question we must ask ourselves is how much testing is
enough. Clearly, we cannot test the program for every possible
combination of inputs. Because we cannot implement an infinite number
of tests, we naturally ask ourselves what a reasonable standard of
thoroughness is.
One of the major contributions of software testing is to provide us with standards of thoroughness that make sense. Following these standards does not guarantee that we will find all bugs. But by breaking the testing problem into subproblems and analyzing each subproblem, we can identify testing methods that provide reasonable amounts of testing while keeping the testing time within reasonable bounds.
The two major types of testing strategies follow:
In this concluding part in this series, we cover both types of
tests, which complement each other by exercising programs in very
different ways.
Clear-Box Testing
The control/data flow graph extracted from a program's source code is
an important tool in developing clear-box tests for the program. To
adequately test the program, we must exercise both its control and data
operations.
In order to execute and evaluate these tests, we must be able to control variables in the program and observe the results of computations, much as in manufacturing testing. In general, we may need to modify the program to make it more testable. By adding new inputs and outputs, we can usually substantially reduce the effort required to find and execute the test.
Example 5-11 below illustrates the importance of observability and controllability in software testing.
No matter what we are testing, we must accomplish the following three things in a test:
Example 5-11 Controlling
and Observing Programs
Let's first consider controllability by examining the following
FIR filter with a limiter:
firout = 0.0; /* initialize filter output */
/* compute buff*c in bottom part of circular buffer */
for (j = curr, k = 0; j < N; j++, k++)
firout += buff[j] * c[k];
/* compute buff*c in top part of circular buffer */
for (j = 0; j < curr; j++, k++)
firout += buff[j] * c[k];
/* limit output value */
if (firout 100.0) firout = 100.0;
if (firout < -100.0) firout = -100.0;
The above code computes the output of an FIR filter from a circular
buffer of values and then limits the maximum filter output (much as an overloaded speaker will hit a
range limit).
If we want to test whether the limiting code works, we must be able to generate two out-of-range values for firout: positive and negative. To do that, we must fill the FIR filter's circular buffer with N values in the proper range. Although there are many sets of values that will work, it will still take time for us to properly set up the filter output for each test.
This code also illustrates an observability problem. If we want to test the FIR filter itself, we look at the value of firout before the limiting code executes. We could use a debugger such as dbx to set breakpoints in the code, but this is an awkward way to perform a large number of tests. If we want to test the FIR code independent of the limiting code, we would have to add a mechanism for observing firout independently.
Being able to perform this process for a large number of tests entails some amount of drudgery, but that drudgery can be alleviated with good program design that simplifies controllability and observability.
The next task is to determine the set of tests to be performed. We
need to perform many different types of tests to be confident that we
have identified a large fraction of the existing bugs. Even if we
thoroughly test the program using one criterion, that criterion ignores
other aspects of the program. Over the next few pages we will describe
several very different criteria for program testing.
Execution Paths
The most fundamental concept in clear-box testing is the path of
execution through a program. Previously, we considered paths for
performance analysis; we are now concerned with making sure that a path
is covered and determining how to ensure that the path is in fact
executed.
We want to test the program by forcing the program to execute along chosen paths. We force the execution of a path by giving it inputs that cause it to take the appropriate branches. Execution of a path exercises both the control and data aspects of the program. The control is exercised as we take branches; both the computations leading up to the branch decision and other computations performed along the path exercise the data aspects.
Is it possible to execute every complete path in an arbitrary
program? The answer is no, since the program may contain a while loop
that is not guaranteed to terminate. The same is true for any program
that operates on a continuous stream of data, since we cannot
arbitrarily define the beginning and end of the data stream.
If the program always terminates, then there are indeed a finite number of complete paths that can be enumerated from the path graph. This leads us to the next question: Does it make sense to exercise every path?
The answer to this question is no for most programs, since the number of paths, especially for any program with a loop, is extremely large. However, the choice of an appropriate subset of paths to test requires some thought. Example 5-12 below illustrates the consequences of two different choices of testing strategies.
Example 5-12 Choosing
the Paths to Test
Two reasonable choices for a set of paths to test follow:
These conditions are equivalent for structured programming languages without gotos, but are not the same for unstructured code. Most assembly language is unstructured, and state machines may be coded in high-level languages with gotos.
To understand the difference between statement and branch coverage, consider the CDFG below. We can execute every statement at least once by executing the program along two distinct paths.

However, this leaves branch a out of the lower conditional uncovered. To ensure that we have executed along every edge in the CDFG, we must execute a third path through the program. This path does not test any new statements, but it does cause a to be exercised.
How do we choose a set of paths that adequately covers the program's
behavior? Intuition tells us that a relatively small number of paths
should be able to cover most practical programs. Graph theory helps us
get a quantitative handle on the different paths required.
In an undirected graph, we can form any path through the graph from
combinations of basis paths. (Unfortunately, this property does not
strictly hold for directed graphs such as CDFGs, but this formulation
still helps us understand the nature of selecting a set of covering
paths through a program.) The term "basis set" comes from linear
algebra.
Figure 5-29 below shows how
to evaluate the basis set of a graph. The graph is represented as an incidence
matrix. Each row and column represents a node; a 1 is entered
for each node pair connected by an edge.
We can use standard linear
algebra techniques to identify the basis set of the graph. Each vector
in the basis set represents a primitive path. We can form new paths by
adding the vectors modulo 2. Generally, there is more than one basis
set for a graph.
The basis set property provides a metric for test coverage. If we cover all the basis paths, we can consider the control flow adequately covered. Although the basis set measure is not entirely accurate since the directed edges of the CDFG may make some combinations of paths infeasible, it does provide a reasonable and justifiable measure of test coverage.
![]() |
| Figure 5-29. The matrix representation of a graph and its basis set. |
There is a simple measure, cyclomatic complexity [McC76], which allows us to measure the control complexity of a program. Cyclomatic complexity is an upper bound on the size of the basis set earlier. If e is the number of edges in the flow graph, n the number of nodes, and p the number of components in the graph, then the cyclomatic complexity is given by
M = e - n +2p. (EQ 5-3)
For a structured program, M can be computed by counting the number of binary decisions in the flow graph and adding 1. If the CDFG has higher-order branch nodes, add b ' 1 for each b-way branch. In the example of Figure 5-30 below, the cyclomatic complexity evaluates to 4. Because there are actually only three distinct paths in the graph, cyclomatic complexity in this case is an overly conservative bound.
Another way of looking at control flow"oriented testing is to analyze the conditions that control the conditional statements. Consider the following if statement:
if ((a == b) || (c =d)) { ... }
![]() |
| Figure 5-30. Cyclomatic complexity. |
This complex condition can be exercised in several different ways. If we want to truly exercise the paths through this condition, it is prudent to exercise the conditional's elements in ways related to their own structure, not just the structure of the paths through them. A simple condition testing strategy is known as branch testing [Mye79]. This strategy requires the true and false branches of a conditional and every simple condition in the conditional's expression to be tested at least once. Example 5-13 below illustrates branch testing.
Example 5-13. Condition
Testing with the Branch Testing Strategy
Assume that the code below is what we meant to write.
if (a || (b = c)) { printf("OK\n"); }
The code that we mistakenly wrote instead follows:
if (a && (b=c)) {printf("OK\n");}
If we apply branch testing to the code we wrote, one of the tests will use these values: a = 0, b = 3, c = 2 (making a false and b = c true). In this case, the code should print theOK term [0 || (3 = 2) is true] but instead doesn't print [0 && (3 = 2) evaluates to false]. That test picks up the error.
Let's consider another more subtle error that is nonetheless all too common in C. The code we meant to write follows:
if ((x == good_pointer) && (x-field1 == 3)) { printf("got the value\n"); }
Here is the bad code we actually wrote:
if ((x = good_pointer) && (x-field1 == 3)) { printf("got the value\n"); }
The problem here is that we typed = rather than ==, creating an assignment rather than a test. The code x = good_pointer first assigns the value good_pointer to x and then, because assignments are also expressions in C, returns good_pointer as the result of evaluating this expression.
If we apply the principles of branch testing, one of the tests we
want to use will contain x != good_pointer and x-field1 == 3. Whether
this test catches the error depends on the state of the record pointed
to by good_pointer. If it is equal to 3 at the time of the test, the
message will be printed erroneously.
Although this test is not guaranteed to uncover the bug, it has a
reasonable chance of success. One of the reasons to use many different
types of tests is to maximize the chance that supposedly unrelated
elements will cooperate to reveal the error in a particular situation.
Another more sophisticated strategy for testing conditionals is
known as domain testing [How82], illustrated in Figure 5-31 below. Domain testing
concentrates on linear inequalities. In the figure, the inequality the
program should use for the test is
j <= i + 1.
We test the inequality with three test points - two on the boundary of the valid region and a third outside the region but between the i values of the other two points. When we make some common mistakes in typing the inequality, these three tests are sufficient to uncover them, as shown in the figure.
A potential problem with path coverage is that the paths chosen to cover the CDFG may not have any important relationship to the program's function. Another testing strategy known as data flow testing makes use of def-use analysis (short for definition-use analysis). It selects paths that have some relationship to the program's function.
![]() |
| Figure 5-31. Domain testing for a pair of variables. |
The terms def and use come from compilers, which use def-use
analysis for optimization [Aho86].
A variable's value is defined when an assignment
is
made to the variable; it is used when it appears
on
the right side of an assignment (sometimes called a c-use
for computation use) or in a conditional expression (sometimes called p-use
for predicate use).
A def-use pair is a definition of a variable's
value and a use of that value. Figure
5-32 below shows a code fragment and
all the def-use pairs for the first assignment to a. Def-use analysis
can be performed on a program using iterative algorithms. Data flow
testing chooses tests that exercise chosen def-use pairs.
The test first causes a certain value to be assigned at the
definition and then observes the result at the use point to be sure
that the desired value arrived there. Frankl and Weyuker [Fra88] have defined criteria for
choosing which def-use pairs to exercise to satisfy a well-behaved
adequacy criterion.
![]() |
| Figure 5-32. Definitions and uses of variables. |
for (i = 0; i < terminate(); i++)
proc(i,array);
It would be too expensive to evaluate the above loop for all possible termination conditions. However, there are several important cases that we should try at a minimum. These cases are summarized below.
We can also have nested loops such as the following:
for (i = 0; i < terminate1(); i++)
for (j = 0; j < terminate2(); j++)
for (k = 0; k < terminate3(); k++)
proc(i,j,k,array);
There are many possible strategies for testing nested loops. One
thing to keep in mind is which loops have fixed versus variable numbers
of iterations. Beizer [Bei90] suggests an inside-out strategy for
testing loops with multiple variable iteration bounds. First,
concentrate on testing the innermost loop as above - the outer loops
should be controlled to their minimum numbers of iterations. After the
inner loop has been thoroughly tested, the next outer loop can be
tested more thoroughly, with the inner loop executing a typical number
of iterations. This strategy can be repeated until the entire loop nest
has been tested. Clearly, nested loops can require a large number of
tests. It may be worthwhile to insert testing code to allow greater
control over the loop nest for testing.
Black-Box Testing
Black-box tests are generated without knowledge of the code being
tested. When used alone, black-box tests have a low probability of
finding all the bugs in a program. But when used in conjunction with
clear-box tests they help provide a well-rounded test set, since
black-box tests are likely to uncover errors that are unlikely to be
found by tests extracted from the code structure.
Blackbox tests can really work. For instance, when asked to test an instrument whose front panel was run by a microcontroller, one acquaintance of the author used his hand to depress all the buttons simultaneously. The front panel immediately locked up. This situation could occur in practice if the instrument were placed face-down on a table, but discovery of this bug would be very unlikely via clear-box tests.
One important technique is to take tests directly from the specification for the code under design. The specification should state which outputs are expected for certain inputs. Tests should be created that provide specified outputs and evaluate whether the results also satisfy the inputs.
We can't test every possible input combination, but some rules of
thumb help us select reasonable sets of inputs. When an input can range
across a set of values, it is a very good idea to test at the ends of
the range.
For example, if an input must be between 1 and 10, 0, 1, 10, and 11 are all important values to test. We should be sure to consider tests both within and outside the range, such as, testing values within the range and outside the range. We may want to consider tests well outside the valid range as well as boundary-condition tests.
Random tests form one category of black-box test. Random values are generated with a given distribution. The expected values are computed independently of the system, and then the test inputs are applied. A large number of tests must be applied for the results to be statistically significant, but the tests are easy to generate.
Another scenario is to test certain types of data values. For example, integer-valued inputs can be generated at interesting values such as 0, 1, and values near the maximum end of the data range. Illegal values can be tested as well.
Regression tests form an extremely important
category of tests. When tests are created during earlier stages in the
system design or for previous versions of the system, those tests
should be saved to apply to the later versions of the system. Clearly,
unless the system specification changed, the new system should be able
to pass old tests.
In some cases old bugs can creep back into systems, such as when an old version of a software module is inadvertently installed. In other cases regression tests simply exercise the code in different ways than would be done for the current version of the code and therefore possibly exercise different bugs.
Some embedded systems, particularly digital signal processing systems, lend themselves to numerical analysis. Signal processing algorithms are frequently implemented with limited-range arithmetic to save hardware costs. Aggressive data sets can be generated to stress the numerical accuracy of the system. These tests can often be generated from the original formulas without reference to the source
Evaluating Function Tests
How much testing is enough? Horgan and Mathur [Hor96] evaluated the coverage of
two well-known programs, TeX and awk. They used functional tests for
these programs that had been developed over several years of extensive
testing. Upon applying those functional tests to the programs, they
obtained the code coverage statistics shown in Figure 5-33 below.
The columns refer to various types of test coverage: block refers to basic blocks, decision to conditionals, puse to a use of a variable in a predicate (decision), and c-use to variable use in a nonpredicate computation. These results are at least suggestive that functional testing does not fully exercise the code and that techniques that explicitly generate tests for various pieces of code are necessary to obtain adequate levels of code coverage.
![]() |
| Figure 5-33. Code coverage of functional tests for TeX and awk (after Horgan and Mathur [Hor96]) |
Methodological techniques are important for understanding the
quality of your tests. For example, if you keep track of the number of
bugs tested each day, the data you collect over time should show you
some trends on the number of errors per page of code to expect on the
average, how many bugs are caught by certain kinds of tests, and so on.
One interesting method for analyzing the coverage of your tests is
error injection. First, take your existing code and add bugs to it,
keeping track of where the bugs were added. Then run your existing
tests on the modified program. By counting the number of added bugs
your tests found, you can get an idea of how effective the tests are in
uncovering the bugs you haven't yet found.
This method assumes that you can deliberately inject bugs that are
of similar varieties to those created naturally by programming errors.
If the bugs are too easy or too difficult to find or simply require
different types of tests, then bug injection's results will not be
relevant. Of course, it is essential that you finally use the correct
code, not the code with added bugs.
Performance Testing
Because embedded systems often have real-time deadlines, we must
concern ourselves with testing for performance, not just functionality.
Performance testing determines whether the required result was
generated within a certain amount of time. In many cases, we are
interested in the worst-case execution time, although in some cases we
may want to verify the best-case or averagecase execution time.
Performance analysis is very important here. Performance analysis can tell us what path causes the worst-case (or other case) execution time. From there we can determine the inputs required and the expected outputs. Real-time code with deadlines must always terminate—we know that the code must finish for any possible set of inputs. Functional testing can help us determine whether the program actually terminates.
How do we measure program performance? Functional testing can be performed on any computer in most cases so long as we can compile the code on that platform. Performance testing is not easy. We may be able to use a simulator to measure performance. However, CPU simulators vary widely in the amount of timing information they provide.
Performance can also be measured directly on the hardware. We can use a timer in the system to count execution time. The program resets the timer at the start and checks the timer's value at the end of execution. We can also use logic analyzers as long as the start and end points of the program are visible on the bus.
To obtain truly accurate performance measurements, we must run the
program in the environment in which it will operate. For example, the
CPU used must have the same size cache as the target platform will
provide. The program should also be run along with other programs that
operate concurrently. When several programs (or programs and interrupt
drivers) operate concurrently, they can interfere with each other in
the cache causing severe performance problems.
Conclusion
The program is a very fundamental unit of embedded systems design and
it usually contains tightly interacting code. Because we care about
more than just functionality, we need to understand how programs are
created. Because today's compilers do not take directions such as
"compile this to run in less than 1 microsecon," we have to be able to
optimize programs ourselves for speed, power and space.
Our earlier understanding of computer architecture is critical to
our ability to perform these optimizations. We also need to test
programs to make sure they do what we want. Some of our testing
technicques can also be useful in exercising the programs for
performance optimization.
To read Part 1, go to "Program
design and analysis."
To read Part 2 , go to "Models
of program, assemblers and linkers."
To read Part 3, go to "Basic
Compilation Techniques"
To read Part 4, go to "The creation of
procedures"
To read Part 5, go to "Register
allocation and scheduling"
To read Part 6, go to "Analysis and
optimization of execution time"
To read Part 7, go to "Trace-Driven
Performance Analysis"
To read Part 8, go to "Analysis
and optimization of energy and
power."
Used with the permission of the publisher, Newnes/Elsevier, this series of nine articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
References:
[Aho86] Alfred VAho, et. al. "Compilers: Principles, Techniques
and Tools. Addison Wesley,1986.
[Hor96] Joseph
R. Horgan and Aditya Mathur, "Software testing and reliability"
Chapter 13 in Hand book of Software Reliability Engineering," IEEE
Computer Society Press/ McGraw Hill 1996.
[How82]
W.E. Howden, "Weak
mutation testing and the
completeness of test cases," IEEE Transactions on Software Engineering.
[McC76]
T.J. McCabe, "A
complexity measure,"
IEEE Transactions on Software Engineering.