The basics of embedded programming: Part 9 - Software Test and Validation - Embedded.com

The basics of embedded programming: Part 9 – Software Test and Validation

Complex systems need testing to ensure that they work as they areintended. But bugs can be subtle, particularly in embedded systems,where specialized hardware and real-time responsiveness makeprogramming more challenging.

Fortunately, there are many available techniques for softwaretesting that can help us generate a comprehensive set of tests toensure that our system works properly. In this last part in thisseries, we examine the role of validation in the overall designmethodology as well as the nuts-and-bolts techniques for creating agood set of tests for a given program.

The first question we must ask ourselves is how much testing isenough. Clearly, we cannot test the program for every possiblecombination of inputs. Because we cannot implement an infinite numberof tests, we naturally ask ourselves what a reasonable standard ofthoroughness is.

One of the major contributions of software testing is to provide uswith standards of thoroughness that make sense. Following thesestandards does not guarantee that we will find all bugs. But bybreaking the testing problem into subproblems and analyzing eachsubproblem, we can identify testing methods that provide reasonableamounts of testing while keeping the testing time within reasonablebounds.

The two major types of testing strategies follow:

  • Black-box methods generate tests without lookingat the internal structure of the program.
  • Clear-box (also known as white-box )methods generate tests based on the program structure.

In this concluding part in this series, we cover both types oftests, which complement each other by exercising programs in verydifferent ways.

Clear-Box Testing
The control/data flow graph extracted from a program's source code isan important tool in developing clear-box tests for the program. Toadequately test the program, we must exercise both its control and dataoperations.

In order to execute and evaluate these tests, we must be able tocontrol variables in the program and observe the results ofcomputations, much as in manufacturing testing. In general, we may needto modify the program to make it more testable. By adding new inputsand outputs, we can usually substantially reduce the effort required tofind and execute the test.

Example 5-11 below illustrates the importance of observability and controllability insoftware testing.

No matter what we are testing, we must accomplish the followingthree things in a test:

  • Provide the program with inputs that exercise the test we areinterested in.
  • Execute the program to perform the test.
  • Examine the outputs to determine whether the test was successful.

Example 5-11 Controllingand Observing Programs
Let's first consider controllability by examining the followingFIR filter with a limiter:

 firout = 0.0; /* initialize filter output */   /* compute buff*c in bottom part of circular buffer */   for (j = curr, k = 0; j < N; j++, k++)   firout += buff[j] * c[k];   /* compute buff*c in top part of circular buffer */   for (j = 0; j < curr; j++, k++)   firout += buff[j] * c[k];   /* limit output value */   if (firout 100.0) firout = 100.0;   if (firout < -100.0) firout = -100.0;          

The above code computes the output of an FIR filter from a circularbuffer of values and then limits the maximum filter output (much as an overloaded speaker will hit arange limit).

If we want to test whether the limiting code works, we must be ableto generate two out-of-range values for firout: positive and negative.To do that, we must fill the FIR filter's circular buffer with N valuesin the proper range. Although there are many sets of values that willwork, it will still take time for us to properly set up the filteroutput for each test.

This code also illustrates an observability problem. If we want totest the FIR filter itself, we look at the value of firout before thelimiting code executes. We could use a debugger such as dbx to setbreakpoints in the code, but this is an awkward way to perform a largenumber of tests. If we want to test the FIR code independent of thelimiting code, we would have to add a mechanism for observing firoutindependently.

Being able to perform this process for a large number of testsentails some amount of drudgery, but that drudgery can be alleviatedwith good program design that simplifies controllability andobservability.

The next task is to determine the set of tests to be performed. Weneed to perform many different types of tests to be confident that wehave identified a large fraction of the existing bugs. Even if wethoroughly test the program using one criterion, that criterion ignoresother aspects of the program. Over the next few pages we will describeseveral very different criteria for program testing.

Execution Paths
The most fundamental concept in clear-box testing is the path ofexecution through a program. Previously, we considered paths forperformance analysis; we are now concerned with making sure that a pathis covered and determining how to ensure that the path is in factexecuted.

We want to test the program by forcing the program to execute alongchosen paths. We force the execution of a path by giving it inputs thatcause it to take the appropriate branches. Execution of a pathexercises both the control and data aspects of the program. The controlis exercised as we take branches; both the computations leading up tothe branch decision and other computations performed along the pathexercise the data aspects.

Is it possible to execute every complete path in an arbitraryprogram? The answer is no, since the program may contain a while loopthat is not guaranteed to terminate. The same is true for any programthat operates on a continuous stream of data, since we cannotarbitrarily define the beginning and end of the data stream.

If the program always terminates, then there are indeed a finitenumber of complete paths that can be enumerated from the path graph.This leads us to the next question: Does it make sense to exerciseevery path?

The answer to this question is no for most programs, since thenumber of paths, especially for any program with a loop, is extremelylarge. However, the choice of an appropriate subset of paths to testrequires some thought. Example 5-12below illustrates the consequences of two different choices oftesting strategies.

Example 5-12 Choosingthe Paths to Test
Two reasonable choices for a set of paths to test follow:

  • execute every statement at least once
  • execute every direction of a branch at least once

These conditions are equivalent for structured programming languageswithout gotos, but are not the same for unstructured code. Mostassembly language is unstructured, and state machines may be coded inhigh-level languages with gotos.

To understand the difference between statement and branch coverage,consider the CDFG below. We can execute every statement at least onceby executing the program along two distinct paths.

However, this leaves branch a out of the lower conditionaluncovered. To ensure that we have executed along every edge in theCDFG, we must execute a third path through the program. This path doesnot test any new statements, but it does cause a to be exercised.

How do we choose a set of paths that adequately covers the program'sbehavior? Intuition tells us that a relatively small number of pathsshould be able to cover most practical programs. Graph theory helps usget a quantitative handle on the different paths required.

In an undirected graph, we can form any path through the graph fromcombinations of basis paths . (Unfortunately, this property does notstrictly hold for directed graphs such as CDFGs, but this formulationstill helps us understand the nature of selecting a set of coveringpaths through a program. ) The term “basis set” comes from linearalgebra.

Figure 5-29 below shows howto evaluate the basis set of a graph. The graph is represented as an incidencematrix . Each row and column represents a node; a 1 is enteredfor each node pair connected by an edge.

We can use standard linearalgebra techniques to identify the basis set of the graph. Each vectorin the basis set represents a primitive path. We can form new paths byadding the vectors modulo 2. Generally, there is more than one basisset for a graph.

The basis set property provides a metric for testcoverage. If we cover all the basis paths, we can consider the controlflow adequately covered. Although the basis set measure is not entirelyaccurate since the directed edges of the CDFG may make somecombinations of paths infeasible, it does provide a reasonable andjustifiable measure of test coverage.

Figure5-29. The matrix representation of a graph and its basis set.

There is a simple measure, cyclomatic complexity [McC76] , which allows us to measurethe control complexity of a program. Cyclomatic complexity is an upperbound on the size of the basis set earlier. If eis the number of edges in the flow graph, n the number of nodes, and pthe number of components in the graph, then the cyclomatic complexityis given by

M = e – n +2p. (EQ 5-3)

For a structured program, M can be computed by counting the numberof binary decisions in the flow graph and adding 1. If the CDFG hashigher-order branch nodes, add b ' 1 for each b -waybranch. In the example of Figure 5-30below , the cyclomatic complexityevaluates to 4. Because there are actually only three distinct paths inthe graph, cyclomatic complexity in this case is an overly conservativebound.

Another way of looking at control flow”oriented testing is toanalyze the conditions that control the conditional statements.Consider the following if statement:

if ((a == b) || (c =d)) { … }

Figure5-30. Cyclomatic complexity.

This complex condition can be exercised in several different ways.If we want to truly exercise the paths through this condition, it isprudent to exercise the conditional's elements in ways related to theirown structure, not just the structure of the paths through them. Asimple condition testing strategy is known as branch testing [Mye79].This strategy requires the true and false branches of a conditional andevery simple condition in the conditional's expression to be tested atleast once. Example 5-13 below illustrates branch testing.

Example 5-13 . ConditionTesting with the Branch Testing Strategy
Assume that the code below is what we meant to write.

if (a || (b = c)) { printf(“OKn”); }

The code that we mistakenly wrote instead follows:

if (a && (b=c)) {printf(“OKn”);}

If we apply branch testing to the code we wrote, one of the testswill use these values: a = 0, b = 3, c = 2 (making a false and b = c true). In this case, the code should printtheOK term [ 0 || (3 =2) is true] but instead doesn't print[0 && (3 = 2) evaluates to false].That test picks up the error.

Let's consider another more subtle error that is nonetheless all toocommon in C. The code we meant to write follows:

if ((x == good_pointer) && (x-field1 == 3)) { printf(“got the valuen”); }

Here is the bad code we actually wrote:

if ((x = good_pointer) && (x-field1 == 3)) { printf(“got the valuen”); }

The problem here is that we typed = rather than ==, creating anassignment rather than a test. The code x = good_pointer first assignsthe value good_pointer to x and then, because assignments are alsoexpressions in C, returns good_pointer as the result of evaluating thisexpression.

If we apply the principles of branch testing, one of the tests wewant to use will contain x != good_pointer and x-field1 == 3. Whetherthis test catches the error depends on the state of the record pointedto by good_pointer. If it is equal to 3 at the time of the test, themessage will be printed erroneously.

Although this test is not guaranteed to uncover the bug, it has areasonable chance of success. One of the reasons to use many differenttypes of tests is to maximize the chance that supposedly unrelatedelements will cooperate to reveal the error in a particular situation.

Another more sophisticated strategy for testing conditionals isknown as domain testing [How82] , illustrated in Figure 5-31 below. Domain testingconcentrates on linear inequalities. In the figure, the inequality theprogram should use for the test is

j <= i + 1 .

We test the inequality with three test points – two on the boundaryofthe valid region and a third outside the region but between the ivalues of the other two points. When we make some common mistakes intyping the inequality, these three tests are sufficient to uncoverthem, as shown in the figure.

A potential problem with path coverage is that the paths chosen tocover the CDFG may not have any important relationship to the program'sfunction. Another testing strategy known as data flow testing makes use of def-use analysis (short fordefinition-use analysis). It selects paths that have some relationshipto the program's function.

Figure5-31. Domain testing for a pair of variables.

The terms def and use come from compilers, which use def-useanalysis for optimization [Aho86] .A variable's value is  defined when an assignmentismade to the variable; it is  used when it appearsonthe right side of an assignment (sometimes called a c-use for computation use) or in a conditional expression (sometimes called p-use for predicate use).

A def-use pair is a definition of a variable'svalue and a use of that value. Figure5-32 below shows a code fragment andall the def-use pairs for the first assignment to a. Def-use analysiscan be performed on a program using iterative algorithms. Data flowtesting chooses tests that exercise chosen def-use pairs.

The test first causes a certain value to be assigned at thedefinition and then observes the result at the use point to be surethat the desired value arrived there. Frankl and Weyuker [Fra88] have defined criteria forchoosing which def-use pairs to exercise to satisfy a well-behavedadequacy criterion.

Figure5-32. Definitions and uses of variables.

Testing Loops
We can write some specialized tests for loops. Since loopsare commonand often perform important steps in the program, it is worthdeveloping loopcentric testing methods. If the number of iterations isfixed, then testing is relatively simple. However, many loops havebounds that are executed at run time. Consider first the case of asingle loop as follows:

for (i = 0; i < terminate(); i++)
proc(i,array);

It would be too expensive to evaluate the above loop for allpossible termination conditions. However, there are several importantcases that we should try at a minimum. These cases are summarized below.

  1. Skipping the loop entirely [if possible, such as when terminate()returns 0 on its first call].
  2. One loop iteration.
  3. Two loop iterations.
  4. If there is an upper bound n on the number of loop iterations(which may come from the maximum size of an array), a value that issignificantly below that maximum number of iterations.
  5. Tests near the upper bound on the number of loop iterations, thatis, n ' 1, n , and n + 1.

We can also have nested loops such as the following:

for (i = 0; i < terminate1(); i++)
for (j = 0; j < terminate2(); j++)
for (k = 0; k < terminate3(); k++)
proc(i,j,k,array);

There are many possible strategies for testing nested loops. Onething to keep in mind is which loops have fixed versus variable numbersof iterations. Beizer [Bei90] suggests an inside-out strategy fortesting loops with multiple variable iteration bounds. First,concentrate on testing the innermost loop as above – the outer loopsshould be controlled to their minimum numbers of iterations. After theinner loop has been thoroughly tested, the next outer loop can betested more thoroughly, with the inner loop executing a typical numberof iterations. This strategy can be repeated until the entire loop nesthas been tested. Clearly, nested loops can require a large number oftests. It may be worthwhile to insert testing code to allow greatercontrol over the loop nest for testing.

Black-Box Testing
Black-box tests are generated without knowledge of the code beingtested. When used alone, black-box tests have a low probability offinding all the bugs in a program. But when used in conjunction withclear-box tests they help provide a well-rounded test set, sinceblack-box tests are likely to uncover errors that are unlikely to befound by tests extracted from the code structure.

Blackbox tests can really work. For instance, when asked to test aninstrument whose front panel was run by a microcontroller, oneacquaintance of the author used his hand to depress all the buttonssimultaneously. The front panel immediately locked up. This situationcould occur in practice if the instrument were placed face-down on atable, but discovery of this bug would be very unlikely via clear-boxtests.

One important technique is to take tests directly from thespecification for the code under design. The specification should statewhich outputs are expected for certain inputs. Tests should be createdthat provide specified outputs and evaluate whether the results alsosatisfy the inputs.

We can't test every possible input combination, but some rules ofthumb help us select reasonable sets of inputs. When an input can rangeacross a set of values, it is a very good idea to test at the ends ofthe range.

For example, if an input must be between 1 and 10, 0, 1, 10, and 11are all important values to test. We should be sure to consider testsboth within and outside the range, such as, testing values within therange and outside the range. We may want to consider tests well outsidethe valid range as well as boundary-condition tests.

Random tests form one category of black-box test.Random values are generated with a given distribution. The expectedvalues are computed independently of the system, and then the testinputs are applied. A large number of tests must be applied for theresults to be statistically significant, but the tests are easy togenerate.

Another scenario is to test certain types of data values. Forexample, integer-valued inputs can be generated at interesting valuessuch as 0, 1, and values near the maximum end of the data range.Illegal values can be tested as well.

Regression tests form an extremely importantcategory of tests. When tests are created during earlier stages in thesystem design or for previous versions of the system, those testsshould be saved to apply to the later versions of the system. Clearly,unless the system specification changed, the new system should be ableto pass old tests.

In some cases old bugs can creep back into systems, such as when anold version of a software module is inadvertently installed. In othercases regression tests simply exercise the code in different ways thanwould be done for the current version of the code and thereforepossibly exercise different bugs.

Some embedded systems, particularly digital signal processingsystems, lend themselves to numerical analysis. Signal processingalgorithms are frequently implemented with limited-range arithmetic tosave hardware costs. Aggressive data sets can be generated to stressthe numerical accuracy of the system. These tests can often begenerated from the original formulas without reference to the source

Evaluating Function Tests
How much testing is enough? Horgan and Mathur [Hor96] evaluated the coverage oftwo well-known programs, TeX and awk. They used functional tests forthese programs that had been developed over several years of extensivetesting. Upon applying those functional tests to the programs, theyobtained the code coverage statistics shown in Figure 5-33 below .

The columns refer to various types of test coverage: block refers tobasic blocks, decision to conditionals, puse to a use of a variable ina predicate (decision), and c-use to variable use in a nonpredicatecomputation. These results are at least suggestive that functionaltesting does not fully exercise the code and that techniques thatexplicitly generate tests for various pieces of code are necessary toobtain adequate levels of code coverage.

Figure5-33. Code coverage of functional tests for TeX and awk (after Horganand Mathur [Hor96])

Methodological techniques are important for understanding thequality of your tests. For example, if you keep track of the number ofbugs tested each day, the data you collect over time should show yousome trends on the number of errors per page of code to expect on theaverage, how many bugs are caught by certain kinds of tests, and so on.

One interesting method for analyzing the coverage of your tests iserror injection. First, take your existing code and add bugs to it,keeping track of where the bugs were added. Then run your existingtests on the modified program. By counting the number of added bugsyour tests found, you can get an idea of how effective the tests are inuncovering the bugs you haven't yet found.

This method assumes that you can deliberately inject bugs that areof similar varieties to those created naturally by programming errors.If the bugs are too easy or too difficult to find or simply requiredifferent types of tests, then bug injection's results will not berelevant. Of course, it is essential that you finally use the correctcode, not the code with added bugs.

Performance Testing
Because embedded systems often have real-time deadlines, we mustconcern ourselves with testing for performance, not just functionality.Performance testing determines whether the required result wasgenerated within a certain amount of time. In many cases, we areinterested in the worst-case execution time, although in some cases wemay want to verify the best-case or averagecase execution time.

Performance analysis is very important here. Performance analysiscan tell us what path causes the worst-case (or other case) executiontime. From there we can determine the inputs required and the expectedoutputs. Real-time code with deadlines must always terminate—we knowthat the code must finish for any possible set of inputs. Functionaltesting can help us determine whether the program actually terminates.

How do we measure program performance? Functional testing can beperformed on any computer in most cases so long as we can compile thecode on that platform. Performance testing is not easy. We may be ableto use a simulator to measure performance. However, CPU simulators varywidely in the amount of timing information they provide.

Performance can also be measured directly on the hardware. We canuse a timer in the system to count execution time. The program resetsthe timer at the start and checks the timer's value at the end ofexecution. We can also use logic analyzers as long as the start and endpoints of the program are visible on the bus.

To obtain truly accurate performance measurements, we must run theprogram in the environment in which it will operate. For example, theCPU used must have the same size cache as the target platform willprovide. The program should also be run along with other programs thatoperate concurrently. When several programs (or programs and interruptdrivers) operate concurrently, they can interfere with each other inthe cache causing severe performance problems.

Conclusion
The program is a very fundamental unit of embedded systems design andit usually contains tightly interacting code. Because we care aboutmore than just functionality, we need to understand how programs arecreated. Because today's compilers do not take directions such as”compile this to run in less than 1 microsecon,” we have to be able tooptimize programs ourselves for speed, power and space.

Our earlier understanding of computer architecture is critical toour ability to perform these optimizations. We also need to testprograms to make sure they do what we want. Some of our testingtechnicques can also be useful in exercising the programs forperformance optimization.

To read Part 1, go to  “Programdesign and analysis.”
To read Part 2 , go to “Modelsof program, assemblers and linkers.”
To read Part 3, go to  “BasicCompilation Techniques”
To read Part 4, go to  “The creation ofprocedures”
To read Part 5, go to  “Registerallocation and scheduling”
To read Part 6, go to  “Analysis andoptimization of execution time”
To read Part 7, go to  “Trace-DrivenPerformance Analysis”
To read Part 8, go to  “Analysisand optimization of energy andpower.”

Used with the permission of thepublisher, Newnes/Elsevier, this series of nine articles is based oncopyrighted material from “Computersas Components: Principles of Embedded Computer System Design” byWayne Wolf. The book can be purchased on line.

Wayne Wolf  is currently the Georgia Research Alliance EminentScholar holding the Rhesa “Ray” S. Farmer, Jr., Distinguished Chair inEmbedded Computer Systems at Georgia Tech's School of Electrical andComputer Engineering (ECE). Previously a professor of electricalengineering at Princeton University, he worked at AT&T BellLaboratories. He has served as editor in chief of the ACM Transactionson Embedded Computing and of DesignAutomation for Embedded Systems.

References:
[Aho86]
Alfred VAho, et. al. “Compilers: Principles, Techniquesand Tools. Addison Wesley,1986.
[Hor96]  Joseph R. Horgan  and Aditya Mathur, “Software testing and reliability”Chapter 13 in Hand book of Software Reliability Engineering,” IEEEComputer Society Press/ McGraw Hill 1996.
[How82] W.E. Howden,”Weakmutation testing and thecompleteness of test cases,” IEEE Transactions on Software Engineering.
[McC76] T.J.McCabe, “Acomplexity measure,”IEEE Transactions on Software Engineering.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.