Expressive vs. permissive languages: Is that the question?

Yannick Moy

April 07, 2010

Yannick MoyApril 07, 2010

Static analysis is becoming mainstream, with mature bug-finding tools for C and Java, including products such as Coverity Prevent, Grammatech CodeSonar, and Fortify SCA. These products limit the level of "noise" (false warnings) inherent to such tools to a minimum. However, by carefully selecting those cases for which they report a problem, these bug-finders hide the fact that they are largely uncertain about the overall correctness of the program.

To see this, it is usually sufficient to lower the ranking threshold below which problems are not reported, which immediately returns an extremely long list of possible problems, representing only a fraction of all the potential problems. By "problem" here, I mean any software bug, whether it's an error possibly detected at runtime (leading to an exception) or a mismatch between the programmer's intent and the obtained behavior.

As a developer and user of such tools, I have tried many times to answer a recurring question that most users have: how can I make my program analyzable? To tell the truth, beyond a few tips on features to avoid and idioms to prefer, usually not much can be done because so many decisions are outside the control of the programmer and instead depend upon the features of the programming language itself.

Therefore, the real question is: which language should I choose to make my program analyzable? This can be rephrased as: what does a given programming language offer me "for free"? By free, I mean information available from the source code without further analysis-information that, although not strictly required to express functional behavior, nonetheless captures part of the programmer's intention. Because software analysis is always short of information about the programmer's intent, the information that comes "for free" in a programming language is the most valuable asset available to an analyzer. By this criterion, C ranks quite below Java, which ranks quite below Ada.

Of course, tools for all languages strive to recover missing information, but as layers of missing information pile up, tools struggle harder to recover higher layers. For example, tools that analyze programs in assembly language are largely limited to recovering only declared types and control-flow, whereas tools that analyze C programs more easily recover higher layers too, such as type-safety and memory-safety properties. Similarly, tools that analyze Java programs can often deduce integer ranges and non-nullity of pointers, in addition to the properties recovered from C programs. Tools that analyze Ada programs are best at also recovering "function contracts," the ultimate goal in software analysis, because so much of this free information is available.

A function contract is made up of a precondition that a caller of the function should satisfy prior to the call and a postcondition that the function should satisfy prior to returning. While user-defined contracts have appeared initially in some pioneering languages (especially Eiffel), they're now available in various widely recognized environments, for example in the .NET platform (Code Contracts) and the GNAT GCC compiler for Ada. Internally, many analyzers are based on generated contracts, also called function summaries, that they compute during the analysis.

I said contracts are the ultimate goal because complete contracts allow analyzing each function in isolation, much like function signatures allow separate compilation. However, only one programming language, SPARK, has gone as far as to require contracts from the programmer. So, in the general case, it's up to the analyzer to generate the contracts.

< Previous
Page 1 of 4
Next >

Loading comments...