Can't get no Boolean satisfaction?
Since its introduction, static source-code analysis has had a mixed reputation with development teams due to long analysis times, excessive noise, or an unacceptable rate of false-positive results. Excessive false-positive results are the main reason why many source-code analysis products quickly become shelfware after a few uses. Despite early shortcomings, the promise of static analysis remained of interest to developers because the technology offers the ability to find bugs before software is run, improving code quality, and dramatically accelerating the availability of new applications. Although static analysis has historically struggled to deliver on this promise, a relatively new use of Boolean satisfiability (SAT) in the field is poised to help static analysis deliver on its potential.
Before the first software application was released, the first software defects had been found and eliminated. Call them bugs, errors, or failures, software defects have existed as long as software itself. As early applications evolved to become more robust and more complex, the remaining defects became more difficult to corral. Simply stated, the more lines of code necessary to create an application, the more defects one would expect to encounter during development.
Consider a theoretical average ratio of one defect per 1000 lines of code (likely a gross underestimate). As a code base expands from thousands of lines, to tens of thousands, to hundreds of thousands, this defect ratio would become overwhelming for developers relying exclusively on manual techniques to control the resulting volume of defects.
With applications assuming more critical functions for business and industry, the consequence of defects in the field now mandates that software meet specific quality standards prior to release. It's at this intersection of opportunity and risk that developers turned to software itself to try and eliminate software defects earlier in the development cycle. Applying static analysis to software, the automated review of code prior to run-time with the intention of identifying defects, was an obvious solution to this fundamental challenge of ensuring code quality.
First-generation static analysis
The first static-analysis tool appeared in the late 1970s. One of the most commonly referred to tools from that era was Lint, which can be regarded as the first generation of commercially viable static analysis. Lint held great promise for developers when it was initially released, because for the first time, it allowed developers to automate the detection of software defects early in the application lifecycle, when they were easiest to correct. The key innovation behind Lint was the use of compilers to do more than simply check compile warnings and errors. By extending the scope of what the compiler looked for in the code, Lint was able to uncover some real defects in software systems, thus enabling it to become the first viable static source-code analysis solution.
In reality, Lint wasn't designed with the goal of identifying defects that cause run-time problems. Rather, its purpose was to flag suspicious or non-portable constructs in the code to help developers code in a more consistent format. By "suspicious code," I mean code that, while technically correct from the perspective of the source code language (e.g., C, C++), it might be structured so that it could possibly execute in ways not intended by the developer. The problem with flagging suspicious code is that, like compiler warnings, code of this type could, and often would, work correctly. Because of this, and Lint's limited analysis capabilities, noise rates were extremely high, often exceeding a 10:1 ratio between noise and real defects.
Finding the real defects amidst Lint's voluminous reports required developers to conduct time-consuming manual reviews of the results, compounding the exact problem that static analysis was supposed to, in its ideal state, eliminate. Hence, Lint was never widely adopted as a defect detection tool. But as a testament to the quality of Lint's underlying technology, many different versions of the product still remain available.
Second -generation static analysis
For nearly two decades, static analysis remained more fiction than fact as a commercially viable production tool for identifying defects. In early 2000, a second generation of tools (e.g., Stanford Checker) emerged that offered enough value to become commercially viable. By leveraging new technology that expanded the capabilities of first-generation tools past simple pattern matching to also focus on path coverage, second-generation static analysis was able to uncover more defects with real run-time implications. Again, the observation was that the compiler could do more than it was doing by simply turning source code into object files. But this time, the requirement was to try to identify not just suspicious code but rather violations of the many ad-hoc system specific rules that modern day software systems must obey.
These tools could also analyze entire code bases, not just one file. By shifting focus from "suspicious constructs" to "run-time defects," developers of these new static-analysis technologies recognized the need to understand more of the inter-workings of code bases. This meant combining sophisticated path analysis with inter-procedural analysis to understand what happened when the control flow passed from one function to another within a given software system.
Despite their adoption and use by organizations, second-generation static analysis still had difficulty finding the sweet spot between accuracy and scalability. Some solutions were accurate for a small set of defect types, but couldn't scale to analyze millions of lines of code.
The problem of wrestling with an elusive sweet spot between accuracy and scalability led to a false-positive problem. Like the noise problem preventing first-generation tools from delivering on the promise of static analysis, it slowed second-generation tools from being more rapidly adopted.