Editor’s Note: In this article, excerpted from Embedded System Security by David and Mike Kleidermacher, the authors evaluate the strengths and weaknesses of static and dynamic code analysis in the development of secure C or C++ code.
Use of static analysis should be a required part of every security-conscious software organization’s development process. Which static analyzer should an organization use? The best answer to this question is that a development organization should use multiple tools from different vendors.
Empirical use within government software safety and security evaluation teams has demonstrated that a surprising majority of software ﬂaws caught by one static analyzer will not be caught by an other tool, and vice versa. Many forms of full-program static analysis are inherently intractable, requiring carefully tuned heuristic algorithms to provide high-quality results.
The best coverage for software ﬂaw detection via static analysis requires multiple tools from multiple vendors to be used in concert. In addition to accuracy, there are large differences in the execution time of static analyzers.
Static Source Code Analysis basics
Static source code analyzers attempt to ﬁnd code sequences that, when executed, could result in buffer overﬂows, resource leaks, or many other security and reliability problems. Source code analyzers are effective at locating a signiﬁcant class of ﬂaws that are not detected by compilers during standard builds and often go undetected during runtime testing as well.
Most static source code analyzers use the same type of compiler front end that is used to compile code. In fact, ideally, a static source code analyzer should be integrated with the everyday compiler to maximize use and reduce complexity of the tool chain. In addition, integrated checking enables source code parsing to be performed only once instead of twice. The use of a compiler front end is only natural because the analyzer takes advantage of preexisting compiler dataﬂow algorithms to perform its bug-ﬁnding mission.
A typical compiler will issue warnings and errors for some basic potential code problems, such as violations of the language standard or use of implementation-deﬁned constructs. In contrast, a static source code analyzer performs a full program analysis,ﬁnding bugs caused by complex interactions between pieces of code that may not even be in the same source ﬁle (Figure 3.1 below).
The analyzer determines potential execution paths through code, including paths into and across subroutine calls, and how the values of program objects (such as standalone variables or ﬁelds within aggregates) could change across these paths. The objects could reside in memory or in machine registers.
The analyzer looks for many types of ﬂaws. It looks for bugs that would normally compile without error or warning. The following is a list of some of the more common errors that a modern static source code analyzer will detect the following:
- Potential NULL pointer dereferences
- Access beyond an allocated area , otherwise known as a buffer overﬂow
- Writes to potentially read-only memory
- Reads of potentially uninitialized objects
- Resource leaks (e.g., memory leaks and ﬁle descriptor leaks)
- Use of memory that has already been deallocated
- Out-of-scope memory usage (e.g., returning the address of an automatic variable from a subroutine)
- Failure to set a return value from a subroutine
- Buffer and array underﬂows
The static analyzer also has knowledge about how many standard runtime library functions behave. For example, the analyzer knows that subroutines such as free should be passed pointers to memory allocated by subroutines such as malloc. The analyzer uses this information to detect errors in code that calls or uses the result of a call to these functions. The analyzer can also be taught about properties of user-deﬁned subroutines.
For example, if a custom memory allocation system is used, the analyzer can be taught to look for misuses of this system. By teaching the analyzer about properties of subroutines, users can reduce the number of false positives. A false positive is a potential ﬂaw identiﬁed by the analyzer that could not actually occur during program execution. Of course, one of the major design goals of a static source code analyzer is to minimize the number of false positives so that developers can minimize time looking at them.
If an analyzer generates too many false positives, it will become irrelevant because engineers will ignore the output. A modern static source code analyzer is much better at limiting false positives than traditional UNIX analyzers such as lint. However,since a static analyzer is not able to understand complete program semantics,it is not possible to totally eliminate false positives. In some cases, a ﬂaw found by the analyzer may not result in a fatal program fault, but could point to a questionable construct that should be ﬁxed to improve code clarity. A good example of this is a write to a variable that is never subsequently read.
Using static techniques for secure code
A recommended approach is to employ at least one runtime efﬁcient analysis pass during everyday software builds executed by individual developers and relegate the remainder of the available tools to ofﬂine execution that can asynchronously notify the development team of discovered ﬂaws.
Since some compilers include built-in full program static analysis, development teams should consider enabling this feature as a compile option for all builds.
The U.S.Food and Drug Administration’s Center for Device and Radiological Health (CDRH) uses static source code analyzers as a forensics tool to help locate causes of medical device failures.
In some cases, several different static analyzers are used in concert. Similarly, the U.S. National Security Agency (NSA) uses multiple static analyzers to help perform security vulnerability assessments on software.
Development organizations should consider evaluating numerous products for both execution efﬁciency as well as quality of output on the same code base. Pick a combination of tools that provide excellent ﬂaw detection coverage while offering sufﬁcient execution time efﬁciency to enable developers to use at least one of them on every compile.
It is important to automate as much as possible the implementation of a coding standard; if the developers’ everyday tool chain is always enforcing the coding standard, then the software security techniques will become assimilated into the minds of all engineers and managers.
In the same way that a professional golfer relies on muscle memory to create a repeatable swing, embedded software developers must think software security everyday to ensure that lapses do not occur.
Unfortunately, the task of writing a coding standard that requires all these great ideas and turning on these software analyzers in the tool chain is not as simple or straightforward as it may sound.
For example, enabling a static source code analyzer for the ﬁrst time in a large code base is almost certain to identify hundreds, if not thousands, of problematic code sequences. Some of these will undoubtedly be false positives that must be worked around by modifying the code to mollify the analyzer.
Numerous real ﬂaws will be discovered, and the organization will be encouraged by their eradication, even though the review and correction of the identiﬁed ﬂaws may require signiﬁcant resource investment.
Nevertheless, experience has shown that once the code base has been made “coding standard clean,” keeping it that way will become routine. Developers must take extreme care when modifying software that has been around a long time(and possibly ﬁelded for a long time).
Some studies have shown that static analyzers and checkers have the potential to harm software security by introducing new ﬂaws when correcting identiﬁed problems.
This risk is especially high when retroﬁtting a new coding standard or new analyzer tool to a code base. In some cases, it may be prudent to disable a check for a particular piece of software rather Than take the risk of modifying it.
For example, let’s suppose management decides to retroﬁt a new rule limiting all functions to a maximum McCabe complexity value of 20. A large code base may include dozens or even hundreds of functions that initially fail to meet this metric.
Some of the offending subroutines may be straightforward to refactor. Others may be difﬁcult and risky. In fact, some of the worst offenders that are begging for a rewrite may be the exact wrong ones to change, especially if they are providing a security-critical function.
If the software includes a hand-coded AES cryptographic algorithm that has been painstakingly developed to be side-channel attack-resistant and has been through FIPS certiﬁcation, perhaps the best approach is to leave this function alone as a documented exception to the coding standard.
Next in Part 2: Using dynamic code analysis.
David Kleidermacher , Chief Technology Officer of Green Hills Software, joined the company in 1991 and is responsible for technology strategy, platform planning, and solutions design. He is an authority in systems software and security, including secure operating systems, virtualization technology, and the application of high robustness security engineering principles to solve computing infrastructure problems. Mr. Kleidermacher earned his bachelor of science in computer science from Cornell University.
This article is excerpted from Embedded Systems Security by David and Mike Kleidermacher, used with permission from Newnes, a division of Elsevier. Copyright 2012. All rights reserved. For more information on this title and other similar books, visit www.newnespress.com.