You think your software works? Prove it!

Formal methods have been quietly moving from the research lab into mainstream software development, especially in high-assurance systems.  With modern programming languages and proof technology, developers can make use of formal methods at various levels, ranging from enforcement of safe language features to proof of compliance with formally specified requirements. Understanding the benefits and costs associated with each level can help developers decide whether and how to adopt and employ formal methods. And it’s not an “all or nothing” choice: formal methods can be mixed with traditional verification techniques (i.e., testing), introduced incrementally into an existing project, and combined with code written in languages such as C or C++. 

Program verification has traditionally relied on testing, but as the late computer scientist Edsger Dijkstra once remarked, “program testing can be used to show the presence of bugs but never to show their absence”. This drawback has been known since the dawn of programming, and one of the earliest research areas in the software field was formal verification: statically analyzing the source code for a program and proving specified dynamic properties with mathematics-based rigor.  For various reasons, however, this methodology did not make it into widespread practice.  The language subsets were overly restrictive, machines were slow, proof techniques were impractical for large programs, goals were overly ambitious (proof of functional correctness), the false alarm rate was too high, and formal verification didn’t integrate well with traditional development methods.

Well, that was then, and this is now.  Recent years have seen marked progress in proof technology, exponentially faster hardware, multicore for parallelism, better and more expressive programming languages, and approaches that can combine proof with testing.  And, not insignificantly, the increased attention being paid to software reliability, safety and security has led to the realization that confidence backed by mathematical reasoning is required for critical program properties.  

The SPARK programming language and analysis / proof toolset illustrates this trend, and experience with its usage has led to the formulation of practical implementation guidance: how to introduce formal methods into an enterprise based on the relevant assurance goals, and how to combine proof with testing.

What Is SPARK?

The SPARK language is a subset of Ada 2012 designed for modular static verification (thus omitting features that are difficult to verify or whose semantics is not well defined), augmented with aspects that facilitate analysis.  It includes Ada’s features for program structure (packages, subprograms, generics), most data types, object-oriented programming, and, most importantly, the contract-based features added in Ada 2012 (e.g., subprogram pre- and post-conditions).  It also includes a deterministic set of Ada tasking (concurrency) features known as the Ravenscar profile.  The most notable omissions are access types (pointers), side effects in functions, aliasing of names (e.g., referring to the same object through a global variable and a subprogram parameter), the goto statement, and exception handling.


Figure 1 shows an example of a fragment of a SPARK program. (Source: AdaCore)

The Increment procedure reads the global variable N and adds its value to the formal parameter X. The names Global, Depends, Pre, and Post are known as “aspects” and represent various properties (contracts) of the procedure Increment. The Global aspect shows that the procedure uses N as an input, the Depends aspect shows that the final value of X depends on its starting value as well as the value of N, the Pre aspect reflects the subprogram’s precondition (needed to ensure absence of arithmetic overflow), and the Post aspect reflects the functionality of the procedure (the final value of X is the sum of the starting value of X and the value of N).  The SPARK tools can statically verify the Global, Depends and Post contracts.  In the case of Post, the proof tool will show that, if the precondition is met and the procedure terminates, then the postcondition will be met.  The precondition is checked at each point where the procedure is called.

Since SPARK is a subset of Ada, the dynamic contracts such as Pre and Post can be treated with Ada semantics and checked at run time.  That allows a smooth transitioning from test-based verification to static (formal) verification, and also the ability to mix and match within the same program.  Some parts of the program can be indicated as being in SPARK mode (and subject to analysis and proof) while others are not in SPARK mode.  As a specific example, a subprogram specification may be in SPARK mode while its body is not.  This allows hybrid verification where, for example, calls of the subprogram are verified formally to make sure that the precondition is met, while the implementation of the subprogram body is verified via testing, for example because it uses features outside the SPARK subset. Indeed, combining SPARK not just with full Ada but with code written in languages such as C or C++ can be handled in an analogous fashion.  Formally verified SPARK code can call, or be called from, code written in these other languages.

Levels of Assurance

One of the impediments to the use of formal methods in the past was the perception that it was “all or nothing”: it was difficult to introduce into an existing project, and seemed to require a whole new mindset and approach for its application. To address this issue, AdaCore worked with one of its customers to see how to introduce formal methods, and SPARK in particular, in the context of ongoing Ada software development.  The result of this collaboration was a methodology expressed as guidance that describes five levels of software assurance; for each level the guidance presents the approach, its benefits and costs, and the impact on the software development and verification processes.

Stone Level

At this basic level, the approach is to implement as much code as possible in the SPARK subset but without necessarily trying to perform the associated data and flow analysis or proof.  The main benefit is that the code will be more portable and easier to maintain. The cost is the effort involved in inspecting non-compliant code and then either adapting it so that it is legal SPARK, or else marking the code as in non-SPARK mode so that it is not subject to formal analysis.  An example is the usage of pointers. If pointers are not heavily used then revising the code to use array indices or the formal containers library may be practical. Otherwise pointers could be defined through encapsulated types in Ada, with their representation as pointers supplied in non-SPARK sections.  Once the initial code base has been converted, the incremental effort in development and verification is similar to full Ada.

Bronze Level

The Bronze level consists of the Stone level plus performing flow analysis to verify intended data usage (global variables, formal parameters, local variables).  This analysis will detect variables that might not be initialized, as well as instances of variable aliasing.  The benefits include guarantees for a number or program properties (no reads of uninitialized variables, no interference between parameters and global objects, no race conditions on access to shared data) as well as improved readability from the contracts (Global and Depends) that show the programmer’s intent.  The analysis will also uncover suspect code that could be an error (for example an unused assignment or a statement that has no effect). The main cost is in the initial pass, which involves inspecting the check messages and either rewriting the code or explaining the false alarms.

Silver Level

The Silver level consists of the Bronze level plus ensuring the absence of run-time errors. This involves running the proof tool, inspecting each failure (i.e., each place where the prover could not guarantee the absence of an exception) and then either correcting the code, explaining why it is a false alarm, or adding appropriate contracts to facilitate proof. Subprogram pre- and post-conditions and loop invariants may be needed.  The benefits at the Silver level are substantial and include the assurance that run-time errors will not occur (thus no buffer overflow) and the ability to compile the final executable without exception checks.  The cost is the effort required in the initial pass, and the added effort in inspecting and resolving the check failures.  Ongoing maintenance is similar to the Bronze level.

Gold Level

The Gold level consists of the Silver level plus proving key integrity properties such as data invariants or safe transitions between program states. The approach is similar to the Silver level, and features such as subprogram pre- and postconditions and type predicates are especially useful.  Figure 2 illustrates a type predicate, which is an invariant that in this example ensures that arrays are always sorted.


Figure 2 shows an example of a type predicate in SPARK. (Source: AdaCore)

The predicate is enforced on assignments to objects of type Sorted_Array. The costs and benefits at the Gold level are similar to the Silver level. The verification effort depends on the complexity of the properties to be proved, and developing the contracts and interacting with the tool may require practice.  For large programs, tool execution may take several hours.

Platinum Level

The Platinum level consists of the Gold level plus proof of full functional correctness (ie, compliance with formally specified requirements).  This may be appropriate for small, critical components that demand the highest degree of confidence and whose developers have experience with the proof technology.

Industrial Experience

AdaCore worked with one of its customers, Thales (France), in introducing SPARK into several pilot projects, to investigate the costs and benefits of formal methods at several assurance levels. Thales is a global company in the Aerospace, Ground Transportation, Defense and Security domains.

Port to a new platform

The context was a radar system comprising around 300 K Lines of Code (KLOC). The target was Stone level. The effort entailed refactoring the code to comply with the SPARK subset, for example to deal with the usage of pointers. A large part of the effort was successfully performed in a few days, and the remainder is scheduled for completion.

Demonstrate compliance with low-level requirements

The context was a small numerical function, and the target was the Gold level to prove a specific property. This project showed the challenges of achieving the Gold level on numeric computation: the contracts ended up being as long and complex as the code that they were designed to prove. The property was not proved automatically, and the “lesson learned” was that for these kinds of programs a lower level of assurance is more realistic and traditional verification methods (testing) is more practical. 

Identify and fix weakness

The context was a code generator, and the target was the Gold level to prove a property related to memory bounds.  Two senior developers, assisted by a SPARK expert, took a half day to reach the Silver level and two days to reach Gold.

Guarantee safety properties

The context was a command and control component (7 KLOC) and the target was the Gold level to prove a property expressed as an automaton. Two senior developers, assisted by a SPARK expert, took one day to reach Silver and four days to reach Gold.

Conclusions

Based on experience from both the pilot projects noted above and other successful applications of formal methods in real-world applications, it’s safe to say that this technology is ready for prime time. Formal verification through SPARK allows the flexibility to fine-tune its gradual insertion into existing processes while mitigating its associated risks and costs. Developers with no previous experience in SPARK or formal methods can quickly become proficient in the technology.

Here are some recommendations concerning the various levels:

  • Use the Stone level as an intermediate stage during adoption

  • Aim for the Bronze level for as large a part of the code as possible, since the flow analysis will detect potential problems with initialization and aliasing and identify other suspicious constructs

  • Choose the Silver level as the default for critical software (subject to costs and limitations)

  • Go for the Gold level only for a subset of the code that is subject to key integrity (safety / security) properties

These recommendations are based on experience with the SPARK language and toolset; the same principles could be applied using other formal method technologies (such as Frama-C) although the specific costs and benefits will be different.

References / Resources

Textbook for students and software professionals: McCormick and P. Chapin, Building High Integrity Applications with SPARK ; Cambridge University Press; 2015

Booklet: Implementation Guidance for the Adoption of SPARK

SPARK blog and resources (User’s Guide)

Webinar on SPARK: Moy and R. Chapman, Building High-Assurance Software without Breaking the Bank

SPARK online training

SPARK toolset


Dr. Benjamin Brosgol  is a member of the senior technical staff of AdaCore. He has been involved with programming language design and implementation thoughout his career, concentrating on languages and technologies for high-assurance software. Dr. Brosgol was a member of the design team for Ada 95, and he has also served in the Expert Groups for several Java Specification Requests. He has presented papers and tutorials on safety and security certification on numerous occasions including ESC (Embedded Systems Conference), ICSE (IEEE/ACM International Conference on Software Engineering), STC (Software Technology Conference), ACM SIGAda, and Ada-Europe. Dr. Brosgol holds a BA in Mathematics from Amherst College, and MS and PhD degrees in Applied Mathematics from Harvard University.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.