# Using formal methods for sophisticated static code analysis

* In this Product How-To design article, Jay Abraham of Mathworks uses the company’s Polyspace code verifier to explain the use formal methods-based static code analysis to ensure high quality and verifiable embedded software. *

Software underlies applications in a multitude of industries today. Aircraft, automobiles, industrial machinery, and medical devices all contain specialized software known as embedded software. This software is directly responsible for a variety of critical tasks. This software must be of high quality and must be thoroughly tested to verify it performs as expected.

In such critical systems, even simple operations performed with software can be fraught with risk. For example, consider an algorithm that requires the addition operation. If the underlying 32-bit microcontroller does not have a floating point unit, you need to be careful to avoid overflow conditions. Consider the operation in this code:

**Click on image to enlarge.**

The operation` c = a + b ` is prone to risk because of the potential for an overflow. If the summation produces a result greater than 2

^{32}-1, the result will not fit in the 32-bit variable

**c**, resulting in an incorrect summation. The microcontroller will not throw a fault or alarm causing the software to behave improperly.

Software developers and verification engineers are ultimately responsible for the quality of the software they produce. Although developers can use manual techniques to quantify the quality of the software, these techniques are not necessarily exhaustive. Automation tools offer more consistent and in some cases more powerful techniques for producing high-quality software, including an automation technique known as static code analysis.

**Static Code Analysis**

Static code analysis, also known as source code analysis or static analysis, is a software verification activity for analyzing source code for quality and reliability. This analysis enables software developers and testers to identify and diagnose errors such as overflows, divide-by-zero, and illegally dereferenced pointers.

Metrics produced by static code analysis provide a means for measuring and improving software quality. In contrast to other verification techniques, static code analysis can be performed without executing the program, developing test cases, or compiling the software program.

You can view static code analysis as an ultra-powerful code review process where code execution paths are analyzed, variable ranges are understood, and concurrent access paths to variables are known *a priori* .

Static code analysis tools can be developed by various means, ranging from simple checks, such as code style checks, to sophisticated formal methods. Simple static code analysis techniques can produce metrics that are indicative of the quality of software, or attempt to identify errors in the software program. These simple techniques are very fast, producing results quickly. However, they can also produce false negative and false positives.

With false negatives, the static code analyzer misses a real error in the source code. False positives are incorrectly identified errors. Both situations are problematic to software developers and verification engineers. False negatives produce a false sense of comfort and include bugs in the released product. False positives delay production of the software and result in unnecessary rework that may degrade performance or memory footprint of the software.

The following example shows the application of a simple static code analysis technique. Consider the following code fragment:

**Click on image to enlarge.**

Simple static code analysis techniques will quickly spot the divide-by-zero error in the operation **b = 12/a.** Compilers will identify this error and flag it during compile time. Now consider the following complex code.

**Click on image to enlarge.**

Let’s assume that `function1` and `function2` perform complex algorithmic and mathematical computation. In this situation, a divide-by-zero condition is possible for the operation ** c = b/a ,** but simple static code analysis techniques cannot definitely identify if this condition will occur and under what conditions the divide-by-zero might occur. At best they can flag that a divide-by-zero condition is possible, but they will not be able to conclusively state the conditions under which this might occur. If indeed a divide-by-zero will not occur, this would be known as a false positive.

**Introduction to Formal Methods**

Modern static code analysis couples formal methods with error-finding techniques. Tools employing this methodology try to find errors in the source code, but they also try to prove the absence of certain critical errors. Formal methods apply theoretical computer science fundamentals to solve difficult problems in software. An example of a formal method technique is abstract interpretation, a mathematically rigorous approach to prove the absence of detectable run-time errors of software.

Abstract interpretation relies on a broad base of mathematical theorems that define rules for analyzing complex dynamic systems. Instead of analyzing each state of a program, abstract interpretation represents these states in a more general form and provides rules to manipulate them. To produce a mathematical abstraction, you need to analyze a software program's semantics.

While this idea is not new, its origins come from theory developed in the 1970s. The first implementations resulted in algorithms that grew exponentially with the size of the code. Newer algorithms address this exponential growth, and when run on today’s processors make the application of formal methods practical.

The trigonometric operation in the equation below helps explain abstract interpretation. Consider the equation in radians where **x** is a real number:

`y = sin(2.14) + cos(0.53) + X ^{2} `

For a given value of **x** , it is not easy to solve this equation manually without the aid of a calculator. However, if asked about the sign of the computation, you can apply the rules of trigonometry and algebra and know that the sign of the result is positive. This is true because the **sin(x) >0** for **x** and that **cos(x) > 0** for **x** . Therefore, the equation will always evaluate to a positive real number, and the result will not be negative or zero.

Using the rules of trigonometry, you can precisely determine some properties of the equation. All the details are not required to make a factual and correct statement about the result.

Using the principles of abstraction as described above, you can use abstract interpretation with static code analysis to identify and diagnose run-time errors such as overflows, divide-by-zeros, and out-of-bound pointers. Metrics produced by this process provide a means to measure and improve software quality.

When applied to the detection of run-time errors, abstract interpretation completely and comprehensively provides a diagnostic of “proven,” “failed,” “unreachable,” or “unproven” for each operation.

**Example of Formal Methods Analysis**

An example of a static code analysis tool that uses formal methods with abstract interpretation is the Polyspace code verifier from MathWorks. This code verification tool consumes source files written in the C, C++, or Ada programming language to identify where certain run-time errors occur and to prove their absence.

The verification process starts by first identifying places in source files that might fail with a run-time error. Then using abstract interpretation, the tool classifies every error as unproven, and subsequently attempts to prove that the code will fail, is unreachable, or is proven not to fail. Using a color-coding scheme as shown in **Figure 1 below** , the tool identifies the status of operations in the code.

**Clickon image to enlarge.**

**Figure 1: Polyspace color-coding showing the status of operations in code and a tool tip showing variable range information.**

Code elements identified in green are proven to be free of certain run-time errors such as overflow, divide-by-zero, illegally dereferenced pointers, and other run-time errors. Operations marked in red show places where Polyspace has detected run-time errors. Those operations that might fail under some conditions are marked in orange. Code elements or conditional branches that cannot be executed (dead code or unreachable code) are marked in gray.

Next, consider how Polyspace analyzes the results of the code fragment shown earlier. As shown in **Figure 2 below** , Polyspace marks the divide operation in green, indicating that this code operation will not result in a divide by zero (because of the ranges of the variable **a** is 43 .. 727 and never zero).

**Clickon image to enlarge.**

**Figure 2: Polyspace results.**Detailed interprocedural static code analysis using abstract interpretation determines that this variable will never be equal to zero.

**Conclusion**

Static code analysis that is augmented with formal methods (abstract interpretation) can be an important tool for improving the quality of embedded software used in high-integrity software systems. A software development process that includes these types of static code analysis tools can contribute to attaining a good quality process. Using this approach helps software developers and verification engineers assess the quality and safety of their production software.

**Jay Abraham** is a product marketing manager with responsibility for Polyspace code verification products at Mathworks . His area of expertise is in software tools for the verification of critical embedded applications. His 20 years of experience include engineering and design positions at hardware, software tools, and embedded operating systems companies including Magma Design Automation and Wind River Systems; he began his career as a microprocessor designer at IBM. Jay has an M.S. in computer engineering from Syracuse University and a B.S. in electrical engineering from Boston University.