Finding defects using Holzmann’s “Power of 10” rules for writing safety critical code

In safety-critical applications, bugs in software are not just costlydistractions—they can put lives at risk. Consequently, safety-criticalsoftware developers go to great lengths to detect and fix bugs beforethey can make it into fielded systems.

Although there are some well-known cases where software defects havecaused disastrous failures, the record is mostly fairly good—if thesoftware controlling medical devices or flight-control systems was asbuggy as most software, the headlines would be truly dreadful.

The methods that safety-critical developers use are undeniablyeffective at reducing risk, so there are lessons to be learned fordevelopers who do not write safety-critical code. Two techniques standout as being most responsible: advanced staticanalysis and rigorous testing.

Static analysis tools have been used for decades. Their appeal isclear: they can find problems in software without actually executingit. This contrasts with dynamic analysis techniques (i.e. traditionaltesting), which usually rely on running the code against a large set oftest cases. The first generation of static-analysis tools, of whichlint is the most widely-known example, were quite limited in capabilityand suffered from serious usability problems.

However, recently a new generation of advanced static-analysis toolshas emerged. These are capable of finding serious software errors suchas buffer overruns, race conditions, null pointer dereferences andresource leaks.

They can also find subtle inconsistencies such as redundantconditions, useless assignments and unreachable code. These correlatewell with real bugs as they often indicate that the programmermisunderstood some important aspect of the code.

The tenth rule
Using advanced static analysis tools is quickly becoming best practice:rule ten of NASA/JPL's Gerard Holzmann's “TenRules for Writing Safety Critical Code” specifies that advanced static analysis tools should be usedaggressively all through the development process.

The other important technique is systematic testing. The importanceof highly rigorous testing has been recognized by some regulatoryagencies. For flight-control software, the Federal Aviation Authorityis very specific about the level of testing required.

The developer must demonstrate that they have test cases thatachieve full coverage of the code. Developing such test cases can bevery expensive. Advanced static-analysis tools can help reduce thiscost by pointing out parts of the code that make it difficult or evenimpossible to achieve full coverage.

Benefits of advanced static analysis
Testing has traditionally been the most effective way to find defectsin code. The best test cases feed in as many combinations of inputs andconditions as possible such that all parts of the code are exercisedthoroughly. Statement coverage tools can help you develop a test suitethat makes sure that every line of code is executed at least once.

But as all programmers know, just because a statement executescorrectly once does not mean it will always do so—it may trigger anerror only under a very unusual set of circumstances. There are toolsthat will measure condition coverage and even path coverage, and theseare all helpful for exercising these corner cases, but achieving fullcoverage for non-trivial programs is extraordinarily time-consuming andexpensive.

This is where advanced static analysis shines. The tools examinepaths and consider conditions and program states in the abstract. Bydoing so, they can achieve much higher coverage of your code than isusually feasible with testing. Best of all, they do all this withoutrequiring you to write any test cases.

This is the most significant way in which static analysis reducesthe cost of testing. The cheapest bug is the one you find earliest.Because static analysis is a compile-time process, it can find bugsbefore you even finish writing the program. This is usually lessexpensive than if you have to find them by writing a test case ordebugging a crash. This article also describes how these tools work,and then shows how they can also reduce the cost of creating testcases.

How advanced static analysis works
All static analysis tools operate in roughly the same way. First, theycreate a model of the subject program by parsing its source code. Themodel includes a set of intermediate representations (IRs) of theprogram, some of which are very similar to what a compiler mightcreate. Once the IR is ready, an analysis phase is started. A diagramshowing the architecture is in Figure 1 below .

Figure1: The architecture of a static analysis tool.

The intermediate representations include the program's symboltables, its abstract syntax tree, its control-flow graph and its callgraph.

The analysis phase is of course the most interesting, as that iswhere defects are detected. Different kinds of defects can be detectedby looking at different aspects of the IR. For example, a rule thatprohibits calls to the famously insecure gets() function can be checkedby inspecting the symbol table. A rule that disallows the use of gotostatements is easily checked by searching the abstract syntax tree.

The more sophisticated tools operate by performing an abstractexecution of the program. As this execution progresses, the tools trackvariable values and how they relate to each other. If anomalies aredetected, warnings are issued.

This can be thought of as a “pretend” or abstract execution of yourprogram. Instead of variables containing actual concrete values, theycontain abstract symbols. Input values are given symbolic names, too.

The analysis then pretends to execute the program using thesesymbols by following paths through the code and using these symbolsinstead of concrete values. As this execution proceeds, the analysismay learn facts about the variables and how they relate to each other.

For example, because of the conditions on the path leading to astatement, the analysis may determine that if variable x is greaterthan ten, pointer p may be NULL. As this execution continues, theanalysis looks for anomalies.

For example, if p is dereferenced at that statement, the analysiswill report that a null pointer dereference may occur. A good tool willexplain the reasoning it used to arrive at that conclusion, which isuseful for helping you determine whether the report is a true or falsepositive.

This is the essential difference between testing and staticanalysis: testing uses real inputs and concrete values, whereas staticanalysis uses symbolic inputs and abstract values. The appeal of thelatter is that, because each abstract value represents a wide range ofpossible concrete values, static analysis can take into account manypossible program states simultaneously and so achieve much greatercoverage than testing.

Static analysis identifies places where violations of thefundamental rules of the language or libraries might lead to an error.The following illustrate some of the most important classes.

The first class is the most serious: bugs that either cause the program to terminateabnormally or result in highly unpredictable behavior. These includebuffer overrun and underrun, null pointer dereference, division by zeroand use of uninitialized variables. Memoryallocation errors result from the misuse of allocators such as malloc()or new.

These can be very tricky to debug because the consequent symptom mayonly show up long after the event that caused the error. Example errorsinclude double free, use after free and memory leak. Concurrency bugsmay be caused by misuse of the threads library. Double locks orunlocks, race conditions and futile attempts to lock are among thekinds of concurrency checks that are available.

A second class of check is for inconsistencies or redundancies. These are not bugs perse, but are very often indicators that a programmer has misunderstoodsomething. For example, if you check the return value of a function 99percent of the time, that 1 percent where you don't check it mayindicate such a problem. This class includes redundant conditions,useless assignments, and checking if a pointer is null after it hasalready been dereferenced.

Finally, some tools allow you to author your own checks. Note thatstatic analysis can't check all properties of your program. If yourcode contains a logic error that causes your program to produce thewrong result, then static analysis will usually be of little helpfinding that bug. That is what testing is for.

Open source example. Toillustrate the kinds of flaws that advanced static analysis is capableof detecting, an example is shown in Figure2 below . Note this is a real flaw found in an open-sourceapplication.

Figure2. A screenshot from CodeSonar showing a warning that there is a bufferoverrun. In this case, the error is on line 2200, where the programmerhas a misplaced parenthesis.

(Click here forEnlarged View)

Management of results
Because static analysis works on an abstraction of the code, falsepositive results are unavoidable in practice. Also, they can have falsenegatives too—they do not guarantee to find all bugs in your code. Toomany false positives means that you may spend too much time siftingthrough the chaff looking for the real bugs.

This robs your development effort of resources that might be betterspent finding bugs through other methods. A high false positive ratehas a subtle psychological implication too: as it increases, users areless likely to trust the results, and are more likely to erroneouslytag a true positive as a false positive.

The real measure of the effectiveness of a static analysis tool ishow well it simultaneously balances the false positive rate with thefalse negative rate.

Different kinds of bug merit different levels of effort to find,depending on the amount of risk they carry. A buffer overrun in amedical device may be life-threatening, whereas a leak in a gamecontroller that means it must be reset once a day is very low risk.

The amount of risk determines the false positive rate that users areprepared to accept. In practice, we have found that for the seriousclass of flaws, such as buffer overruns and null pointer dereferences,users are often prepared to accept a false positive rate of as much as75 percent to 90 percent. For less risky classes, a false positive rateof more than 50 percent is usually considered unacceptable.

An important aspect of these tools is their usability. Once a falsepositive has been identified and labeled, the user will not want to seethat warning again. If the warning is a true positive, tools wouldallow users to annotate the warning with comments or assign it to auser. Warnings are typically saved in a database so managers can createviews of high-level properties, such as which modules are most risky orhow the code quality is changing over time.

Relationship with systematic testing
As mentioned previously, safety-critical developers are required todemonstrate coverage. There are of course different kinds of coverage,and the risk the code carries dictates which kind of coverage isrequired.

In the DO-178B standard for aviation, the riskiest code is requiredto be tested to 100 percent modified condition/decision coverage(MCDC). The next two most risky classes must be tested to 100 percentdecision coverage and statement coverage respectively. The least riskycode, such as the in-flight entertainment system, has no coveragerequirements at all.

Most developers will be familiar with statement coverage, and anyonewho has tried to test with 100 percent statement coverage will knowexactly how difficult it is to achieve (Table1, below ). Decision coverage is more stringent: it requires thatall branches in the control flow are taken: you must make test casesthat force all your conditional expressions to evaluate both true andfalse.

Table1: Shown is the number of test cases required to achieve full coveragefor each coverage criterion.

MCDC coverage is stronger still, and means that all sub-expressionsin a conditional should be evaluated to both true and false. Considerwhat this means in terms of the number of test cases for the followingcode snippet:

if (a || b || c)

foo();

Table 1 above shows thenumber of test cases required to achieve full coverage for eachcoverage criterion and example values required of the inputs. Clearly,achieving full coverage is non-trivial. It can of course befundamentally impossible to achieve.

If your program containsunreachable code, you cannot even get full statement coverage, nevermind decision or MCDC. Similarly, if your code contains redundantconditions (conditions that are either always true or always false),then you might achieve statement coverage, but not decision coverage.A developer may spend hours trying to develop a test case before itbecomes evident that it is impossible. If this had been known inadvance, much frustration and wasted time would have been saved.See forexample, line 475 in Figure 3 below. .

Figure3: A redundant condition warning from CodeSonar

(Click here forenlarged view)

The comment in the leftmost column indicates that the condition on thatline is always true. In this case, there is an obvious redundancy: Thethird sub-expression is the same as the second sub-expression.

If the second sub-expression is true, the third will be evaluated, butthat will always be true also. It is difficult to divine the realintent of the programmer here. Perhaps there was a third field whosevalue should have been checked. These kinds of issues correlatewell with bugs too. Consider the example shown in Figure 4 below.

Figure4: A fragment of a report from CodeSonar illustrating a redundantcondition.

((Click here to view enlarged image )

The column on the left shows the condition that cannot be satisfied:pb->type cannever be equal to 3 .This value comes from a call toget_type(), whose definitionis shown in Figure 5 below .

Figure5: The body of get_type(). There is an error on line 30.

Tip: The error is on line 30 and is clearly a typo: The programmermissedthe final “X”.

Techniques for reducing the risk of bugs in software forsafety-critical systems can work to reduce bugs in non-safety-criticalsystems. Advanced static-analysis tools can help by finding real errorsautomatically and reducing testing costs.

Paul Anderson is vice president ofengineering at GrammaTech, a spin-off of CornellUniversity that specializes in static analysis. He received his B.Sc.from Kings College, University of London and his Ph.D. in computerscience from City University London. Paul manages GrammaTech'sengineering team and is the architect of the company's CodeSonarstatic analysis tools. Asignificant portion of his work has involved applying program analysisto improve security. He  can be reached at paul@grammatech.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.