Verifying embedded software functionality: fault localization, metrics and directed testing
In our running example, the inputs Climb = 1 and Up = 100 lead to an unexpected output. The set of statements executed in this failed execution run is {1, 2, 5, 7, 8, 9, 13,14}. Furthermore, the inputs Climb = 0 and Up = 0 lead to an expected output. The set of statements executed in this successful execution run is {1, 3, 4, 5, 7, 8, 9, 13,14}. So, the bug report is the difference between these two sets: {1, 2, 5, 7, 8, 9, 13,14}_{1, 3, 4, 5, 7, 8, 9, 13,14} = {2} Once this line is pointed out to the developer, he or she should be able to locate the error in line 2. Note here that the choice of successful execution run is crucial. Consider an execution of the program in Figure 5.9 with the inputs Climb = 1 and Up = 200. When executed with these inputs, the program will print “Upward,” which is what the developer expects. So, the execution run for these inputs is deemed as a successful run. What would have happened if we chose this execution run to compare with our failed execution run (the one resulting from the inputs Climb = 1 and Up = 100)? The set of statements executed for the inputs Climb = 1 and Up = 200 is {1, 2, 5, 6, 9, 10,11}. So, in this case the bug report (the difference in executed statements between the failed run and the chosen successful run) would have been {1, 2, 5, 7, 8, 9, 13,14}_{1, 2, 5, 6, 9, 10,11} = {7, 8, 13,14} The bug report in this case consists of more statements. Moreover, the statements do not pinpoint to the actual error cause in the program; they are only manifestations of the error cause. This simple example should demonstrate to the reader that the choice of successful run is crucial for the usefulness of the bug report generated by fault localization methods. Figure 5.10 below summarizes the class of trace-based debugging methods we are presenting here - commonly called fault localization.
The aim of these methods is to find the error cause in a failed execution run (a run with unexpected behavior) in a given program. This is done by comparing the failed execution run with a successful execution run. The successful execution run is chosen from a test pool, possibly constructed by coverage-based testing. The chosen successful run is then compared with the failed execution run, and the difference is reported back to the developer as a bug report. Of course, there are many metrics by which to compare the execution traces, such as: Sets of statements executed,
Sequence of branch statements executed,
Sequence of statements executed,
Sequence of variable values taken (for selected variables), and so on. Trace Comparison Metric Based on Statement Sets
The simplest trace comparison metric (which is very efficient to compute) is simply: Set of statements executed in failed run - Set of statements executed in successful run.Given a failed run, we choose a successful run that minimizes this quantity. However, this trace comparison metric does not distinguish between runs that execute exactly the same statements but in a different order. Consider the schematic program fragment in the following, consisting of an if-then-else within a loop:
for (...) {
if (...) S1 else S2;
}
Consider two runs of the program S1,S2,S1,S2 and S2,S1,S2,S1. These two runs execute the conditional branch in the if-then-else statement differently in every iteration of the loop. Yet a trace comparison metric based on sets of statements executed cannot distinguish between these two runs. Given a failed run, say, S1,S2,S1,S2, suppose we choose S2,S1,S2,S1 as the successful run with which we compare our failed run. The result is a null bug report, which is not only useless for debugging, but also misleading to the developer. Trace Comparison Metric Based on Branch Alignments
For the reasons mentioned in the preceding, the software debugging community has studied trace comparison metrics that compare traces of a program by finding out which branches are evaluated differently. Such a difference metric measures the difference between two execution runs π and π’ of a program, by comparing behaviors of “corresponding” branch statement instances from π and π ‘. The branch statement instances with differing outcomes in π, &pi: are captured in diff (π π’) - the difference between execution run π and execution run π’. In order to find out “corresponding” branch instances, a notion of alignment is defined, to relate statement instances of two execution runs. Typically, such a branch alignment can be based on dynamic control dependence. Here we illustrate the distance metric with an example. Consider the program fragment in Figure 5.11 below, taken from the Siemens benchmark suite. This piece of code changes all substrings s1 in string lin matching a pattern to another substring s2.
Here variable i represents the index to the first unprocessed character in string lin, variable m represents the index to the end of a matched substring s1 in string lin, and variable lastm records variable m in the last loop iteration. The bug in the code lies in the fact that the branch condition in line 3 should be if (m >= 0) && (lastm != m). At the i-th iteration, if variable m is not changed at line 2, line 3 is wrongly evaluated to true, and substring s2 is wrongly returned as output, deemed by programmer as an observable “error.” In Figure 5.12 below, we show some traces from the program in Figure 5.11. The difference between execution runs π and π ‘ is: diff (π π’) = 3 3,714, as indicated in Figure 5.12 below. This is because branch instances 33,714 are aligned in runs π and π’ and their outcomes are different in π π ‘ If the branches at lines 33,714 are evaluated differently, we get π’ from π. Similarly, the difference between execution runs π and π’ is diff (π’ π”) = 76,714.
Click on image to enlarge.
Why do we capture branch event occurrences of π that evaluate differently in π ‘ in the difference diff (π π’)? Recall that we want to choose a successful run for purposes of fault localization. If π is the failing run and π ‘ is a successful run, then diff (π’ π”) tells us which branches in the failing runπ need to be evaluated differently to produce the successful run π’. Clearly, if we have a choice of successful runs, we would like to make minimal changes to the failing run to produce a successful run. Thus, given a failing run π and two successful runs π’ π” we choose π ‘ over π “ if diff (π, π’) < diff (π, π”). This requires us to compare differences. How we do so is elaborated in the following. Given a failing run π and two successful runs π’ π” we can say that diff (π, π’) < diff (π, π”) based on a combination of the following criteria: 1 - Fewer branches of π need to be evaluated differently to get π” as compared to the number of branches of π that need to be evaluated differently to get π”. 2 - The branches of π that need to be evaluated differently to get π ‘ appear closer to the end of π (where the error is observed), as compared to the branches of π that need to be evaluated differently to get π”.


Loading comments... Write a comment