The traditional definition of a safety critical program is one in whichhuman life depends on the correct operation of the program. If there isa bugin such a program, then death or serious injury can result.
Typical examples are signaling systems on trains, avionics controlsystems, medical instrumentation, and space applications. Since thefocus is onhuman safety, we apply requirements to such programs that essentiallyrequire that they be error free.
That's a strong requirement, especially given the common wisdom thatall large programs contain serious errors. But in our moderntechnological age we place our safety at the mercy of computer softwareprograms every time we board a train or plane, or enter a hospital, oreven drive a car.
We simply have to ensure the reliability of such programs, and as wewill see, it is in fact possible and practical to achieve the seeminglyvery difficult goal of writing essentially error-free completelyreliable software.
Although we are definitely focusing on safety-critical softwarehere, it isworth noting that in our modern complex world, more and more criticalfunctions depend on computers. For example, banks rely on programs forcontrolling the international financial system. People may not diedirectly as a consequence of major failures in this area, but there arestill awfulconsequences if such systems fail.
Similarly, much of our general infrastructure, including phones, theinternet, water and electricity supplies, and many other critical needsof modern life are dependent on software. If we can indeed devisemethods of writing completely reliable code, we can certainly find moreapplications for suchtechniques than traditional safety-critical programs encompass.
One of our important theses in this article is that all programmersshould be aware of these techniques so that some appropriate subset ofthem can beapplied much more widely. Almost no one will say that reliability isuninteresting for a planned software project. We may not be able toafford to make every application completely reliable, but for sure wecan do a better job in the future in reaching this goal, and we candefinitely extendthe notion of high integrity systems beyond the domain ofsafety-critical programming alone.
What we've learned aboutsafety-critical software
We now have over fifty years of experience in writing large programs.During that period we have developed many techniques which can berefined to play a part in the design and implementation ofsafety-critical software.
Perhaps the most important and fundamental requirement is thateveryoneinvolved in such a design effort must orient themselves to adisciplined view that is entirely quality-oriented.
I am an enthusiastic Ada supporter, and I should disclose that rightaway, though what I have to say here is certainly not Ada specific. ButI will say that one of the advantages of Ada in this area, apart fromsome important objective features, is that Ada was designed with thiskind of qualityorientation in mind, and the culture that surrounds Ada tends to havethis emphasis. Even if you are not using Ada in a critical application,you will do well to borrow this kind of culture.
This may seem like a trivial observation, but in my experience, theissue of culture and attitude is a critical one. If a team is totallydedicated to quality, it is far more likely to achieve its goal.Nevertheless, even the most dedicated team needs the tools andprocedures that will help ensure success,and we will now examine some of the critical aspects that help ensuresuccess in writing totally reliable programs.
In this article, we examine the influence of programming languagedesign on the production of safety-critical software. First, we shouldstart by noting that it is possible to do anything in any language. Onecan even prove that statement in some theoretical sense. However, weknow from a lot of experience that programming language design isdefinitely significant and can affect the ease of writing programs.
When it comes to writing error-free code, it is most certainly thecase that we want all the help we can get, and to the extent thatprogramming language design can help avoid errors, we definitely wantthat help. So let's look at which languages need to provide in thisarea.
As a starting point, we will note that C, and Java and C++ are notsuitable languages for writing safety-critical software. And before youstart thinking that these are peculiar statements from an Adaenthusiast, we rush to add Ada to the list of languages not suitablefor safety-critical software.
What do we mean by this rather outrageous statement? The point weare making is that all of these languages, in their entirety, are toocomplex to be used in this arena. We can't let programmers use the fullpower of anyof these languages; even C has too much functionality.
The extended functionality of modern programming languages does makeit easier to write code in the first place, but we have to worry aboutdemonstrating that the resulting code is error-free.
So what do we need to do?
The answer is that for any of these languages we need to subset thelanguage, so that we write our programs in a very well understoodsubset of the chosen language, which avoids unnecessarily complexsemantics.
For instance, in Ada, we most likely avoid using the full power ofAda tasking. In C, we exclude some of the C library routines which areunlikely to be available in a safety-critical environment. For C++ weavoid the complexuse of templates. For Java, we avoid the use of dynamic features thatallow the program to modify itself while it is running.
[Of course, there are issuesbesides size that can interfere with a language's ability to supportsafety-critical development. For example, C has a number of error-pronefeatures that can hinder a program's readability. If we see theconstruct “if (X=Y) …” are we sure that the programmer reallymeant to assign Y to X, and not compare the two for equality? ]
The exact choice of the set of features to be used is a challenginglanguage design task, and the base language may be more or less helpfulin this process. In the case of Ada for example, the built in notionsof pragma Restrictions and pragma Profile make it much easier tospecify and controlthe resulting profile. Two interesting examples of such language subsetdesigns are MISRA C and the SPARKAda subset, about which we will have more to saylater.
These are examples which show how a coherent subset can be designedthatmakes the use of the language more effective for safety criticalpurposes. The MISRA group is currently busy designing a similar subsetof C++ to be called MISRA C++.
So what features should we look for in a language to be used forsafety critical programming? Most obviously we want to favor compiletime checking that can find as many problems as possible at compiletime.
Ada is an example of a language that is designed with this criterionin mind. Programmers learning Ada for the first time often comment thatit is hard work to get the compiler to accept a program, but when itdoes, the program is far more likely to run correctly the first time.That characteristic may be a bit annoying for rushing out programsrapidly where reliability is not paramount, but for safety-criticalprogramming it is just what we want.
Ada achieves this partly by implementing a much more comprehensivetype system, in which for example there are multiple integer types, andthe compiler can check at compile time that you are not doing somethingthat makes no sense like adding a length to a time.
When we subset the language, we try to avoid aspects of the languagewhich violate this important criterion, and as MISRA C shows, even alanguage which starts out with almost the opposite criterion can beimproved considerably by careful subsetting.
Run time checking?
Another important issue is run time checking. Again, using Ada as anexample, the Ada language defines many run time checks that arerequired to raise exceptions if they fail. As any Ada programmer knows,these checks and resulting exceptions are enormously valuable infinding errors in the early stage of testing, rather than later on inthe development process.
The issue of whether such checks should be enabled in the finalproduct is an interesting one. On the one hand, we would prefer todemonstrate that a program is free of any possibility of run timeerrors. On the other hand, it provides an extra safety belt in case ofa problem sneaking through our careful procedures. But for sure suchrun time checking is invaluable during the testing process.
Though we have emphasized simplicity in language subset selection,we nevertheless have to recognize that safety-critical applications aregetting more complex, and we have to be able to accommodate theserequirements. We mentioned that the full tasking capabilities of Adaare probably not appropriate for safety-critical applications. However,support for multi-tasking is becoming more and more important.
One of the important additions to Ada 2005 (the latest version ofthe Ada language) is the Ravenscartasking profile which is specifically intended for safety-criticaluse. This provides an excellent introduction to this feature, with someuseful insights into the design criteria and usage. Another interestingeffort is the Java real-time development work. The original Java threadspecification is inadequate, and this work aims to correct that. Fordetails, see Adaand Java: real-time advantages.
The choice of language is always a hotly debated issue. We startedout by noting that any problem can be solved in any language, and thatis certainly true. Safety-critical applications have been written inmany differentlanguages. Nevertheless choice of language is important and it is noaccidentthat Ada finds its widest use and support in the context of largesafety-critical applications such as air traffic control.
Given that we want to demonstrate that a program is completelyreliable, a natural approach is to decide that we should prove theprogram correct in a mathematic sense. That way we won't have to relyon testing or any other subjective measures.
Some twenty years ago, the notion of proof of correctness was allthe rage in academic circles, and still today there are academiccomputer scientists who assume that this is the solution to the problemof writing reliable code.
The correct meaning of correct
What's wrong with this viewpoint? Well most significantly, we have tofigure out what correct means. The standard model is that we need aformal specification of the problem, and then we will prove that theprogram properly implements this formal specification. Unfortunatelythere is a huge hole in this approach.
How do we come up with the formal specification? For small academicproblems, like sorting an array of numbers, we can indeed write down aformal specification in an appropriate language and construct amathematical proof that a given program meets this specification. Inorder to actually have confidence in the proof, we need to verify theproof using a mechanical process, but that is also quite feasible forsmall cases.
But what of large applications? First of all there are aspects oflarge programs that are just not easy or even possible to formalize.For example, a pilot's cockpit must present a user-friendly interface.The notion of user-friendly is hardly a formal one. Another problem isthat for a large program, the specification is itself a huge document.Furthermore it is written in a formal speci- fication language that maybe harder for many people to read than a normal program in aconventional programming language.
How do we know the specification is right? The answer is we don't,and the problem of writing a reliable correct program has simply beentransformed to the problem of writing a reliable correct specification.
For these reasons, the notion of proving entire large applicationscorrect has largely disappeared from view. That's particularly true inthe United States, where typical academic programs are far more likelyto offer courses in Unix Tools, and Web Programming than in formallogic and proof of correctness.
So is this approach a dead end? Not at all! It is just that we haveto recognize that proof techniques and the use of formal methods toolsare not the only answer, but that does not mean they cannot play a veryimportant part. In England, there is much more emphasis on formalmethods. For example, the MoD standard for safety-critical programsrequires the use of formal methods (although it is not very specific onwhat this means).
<>So it is not surprising that to learn more about this approach,weshouldfind that a British company, Praxis High Integrity Systems, isone of the leading practitioners in the area. What Praxis is able toshow is that although total proof of correctness may not be feasible,it is still very useful to be able to prove specific properties holdfor a program.
As an example, in Ada, there is a very clear definition of a set ofrun time conditions that cause exceptions to be raised. An exception inAda generally corresponds to an error situation, and we certainly don'twant a critical program to contain such errors. The task of provingthat a program is free of any possibility of exceptions is a welldefined task, and has actually been achieved for non-trivial software.For details, go toIndustrial Strength Exception Freedom.
In order to construct such proofs, it is essential that the programbe written in a relatively simple very well defined language. For thispurpose, Praxis has designed SPARK, a subset of Ada, which is enhancedby the addition of static annotations, which for example say whatvariables can be accessed where. The SPARK Examiner tool verifies theseconditions, and other Praxis tools allow the proof of specificproperties of a program.
Many other companies are also working in this area, but Praxis isdefinitely a leader in this area, and what we can learn from this isthat the use of formal methods and proof tools is not just an academicexercise, but is useable in practice as an important tool in thearsenal of the safety-critical programmer.
Testing, testing, testing
If we cannot in practice prove all the properties that we need todemonstrate, how shall we ensure the safety of a program. One answer isgiven in the above title of this section. Now we are all taught thesimple observation that testing can never show the absence of bugs, itcan only show the presence of bugs. This is certainly true from atheoretical point of view, but still, we definitely trust a programthat has been tested more than one that has not, and the more thoroughthe testing, the more we trust it.
Can we in practice devise testing approaches that are sufficientlythorough that we are willing to literally risk our lives on theresulting demonstration that there are no known problems? That's anenormously significant question.
The DO-178B certification standard includes many aspects, but themost significant is a thorough testing approach that tries to answerthis question affirmatively. It does this in a two-pronged approach.
First, it specifies an approach for generating systematic functionaltests. These tests must test all functional aspects of the program, atall levels of abstraction. The tests are derived in general terms fromthe problem statement and specification, and at a more detailed levelfrom the actual code of the program to make sure that every detail ofthe logic works correctly.
Then we insist that full coverage testing be done. This means thatin our test suite every statement is executed at least once. Thatdoesn't guarantee anything, but we really don't have much confidence instatements that have never been executed.
This may seem like an obvious and simple observation andrequirement, but in practice, most large non-safety-critical programsare not tested in this way, even though tools are available for suchtesting. For example, the failure of the AT&T long lines system wasdue to the execution of error recovery software that had never beentested.
The DO-178B standard has a number of different levels, correspondingto different requirements for safety. For level A certification, thehighest level for the DO-178B standard and the one we associate withlife-critical systems, there is an additional requirement regardingflow of control. Consider:
if condition then
Now here simple coverage testing will only ensure that thestatements have been executed and it could be that the test suitealways has condition set to true. That's not really enough. We alsowant to know that if condition is false, it is safe to skip thestatements. A more complicated example is:
if condition1 and condition2 then
Now here we really want to test various combinations of conditionsto make sure that all possibilities are covered. But we don't need totest all possible conditions. In particular, if condition1 is falsethen we don't care about condition2, but we would like to test thefollowing three cases:
condition 1 true, condition 2 true
condition 1 true, condition 2 false
The testing regime that ensures this is called MC/DC (modifiedcondition/ decision coverage), and there are tools to enforce therequirement that the set of tests include all cases. For additionalinformation on this approach, see ModifiedCondition/Decision Coverage software testing criterion whichcontains a very thorough bibliography on this technique.
One interesting issue is whether to apply the coverage testing tothe source or the object code. We can't fully trust compilers becausethey are far too complex to be themselves fully certified (or”qualified as development tools”, in DO-178B parlance).
So what should we do? There are two approaches. We can either do alltesting at the object code level. This is for example, the approachused by the Verocel tools.. The otherapproach is to do coverage at the source program level, but in thiscase it is necessary to establish full traceability between the sourceprogram and object program. Both approaches have been usedsuccessfully, and both have their advocates (we have seem some fiercearguments between these two schools of thought in some of the projectswe have worked on).
DO-178B: Not just another mindlessset of rules
One important aspect of DO-178B is that it is not simply a mindless setof objective rules. At the heart of the process is a human exercisingjudgment.
These DERs (Designated Engineering Representatives) are independentauthorities whose job it is to make sure the rules have been followedto the letter and in spirit. They are the “building inspectors” of thecritical software engineering industry, and their extensive experiencehelps to make sure that the standard works in practice.
How well does the testing regime that DO-178B imposes work? Thepragmatic answer is that it is pretty successful. Remember that wedon't really require software to be 100% guaranteed to be totally errorfree. Rather we want to make sure that we can write software that isreliable enough so that it is not the weak link in the chain.
If we take commercial avionics as an example, many lives have beenlost due to various hardware failures on scheduled commercial flights,but no lives have been lost (as far as we can determine) as a result ofsoftware bugs in this arena. That's a pretty impressive record. Ofcourse there can always be a first time, but so far we have done prettywell.
That does not mean we are content with the current state of affairs.We are working hard on improving our understanding of formaltechniques, so we can use proof techniques more effectively. We areworking on improving our programming languages so that they make iteasier to write reliable programs, and we are improving our testingapproaches.
For example, the current work on DO-178C is addressing the issue ofobject oriented techniques, which we examine in more detail in the nextsection.
Still, we are doing pretty well. Often you will hear people say thatour software technology is terrible and all big programs containsserious bugs. Well that may be true of some areas, and for example weworry a lot about automobiles, where safety standards have not caughtup with the increasing use of computers in cars. But in areas where weapply strict certification standards, we have a string of successes. Ofcourse we can't cover the DO-178B standard in detail here, but we haveoutlined some of its important features. For further details on thisstandard, a good source is:www.software.org/quagmire/descriptions/rtcado- 178b.asp
There are many other resources on the net, that describe approachesthat various vendors have taken in meeting the requirements of thisstandard. When we are aiming at perfection, we need to take fulladvantage of all the tools at our disposal. We can achieve a lot bycareful reading of programs with software experts doing the reading,but often we do even better by using automated tools to help with thisprocess. There are many general categories of such tools.
Static analysis tools analyze the structure of a program to detecterrors and to provide information that will help find problems beforethey cause trouble. An example of such a tool is CodeSonar fromGrammatech.
This tool automatically finds errors such as buffer overruns in C++programs. There are many such tools from many suppliers. Of course notools of this kind can guarantee that your program is correct, buteverything we do helps to increase our confidence, and it is the sumtotal of this information and effort that leads us to be willing to geton the plane that will be deploying our software at the end of theprocess.
Choosing an appropriate set of tools and developing experience intheir use can be as important as language and compiler selection. Notethat the tool set is also likely to be language dependent. For example,in Ada we are less concerned with the buffer overflow problem, sincethe built-in exception mechanism will detect any such problems.
Compiler vendors often provide useful suites of such tools, andevaluation of the full tool suite should be an important part of theevaluation of languages and compilers. For example, our company,AdaCore provides a complete suite of tools. One such useful tool is astatic stack usage analyzer that addresses the one specific requirementthat a program does not overflow any stacks.
There are many different kinds of tools for analyzing suchproperties as schedulability, worst-case timing, run-time use ofstorage, freedom from race conditions, freedom from unwanted sideeffects, automated testing, metrics etc. An important part of thepreparation for a project aiming at a high-integrity product is toinvestigate and acquire a coherent set of tools.
The use of integrated development environments is often a useful wayof organizing such a set of tools. For example, the GNAT ProgrammingStudio (GPS) product from AdaCore provides a convenient way oforganizing a wide variety of tools in a coherent manner, and there aremany other such products.
Object oriented programming
Object oriented programming methods have become an important part ofthe arsenal of tools in the hands of a modern programmer, and a widevariety of languages support these notions (C++, Java, Ada and manyothers). In this section we will look at the special considerations ofusing these methods in a safety critical environment.
The notion of OO programming is a little ill-defined. On the onehand it refers to a design method in which objects communicate viamessage passing. Such a design method in and of itself poses no specialproblems or safety concerns.
On the other hand, it refers to the use of a set of features inprogramming languages, originally conceived to facilitate the use ofthe object oriented design approach, but more widely useable for manypurposes. This set of features typically comprises three importantcomponents:
1) The ability to extendexisting types by adding new data elements
2) Automatic inheritance ofexisting methods when types are extended, along with the ability tooverride such methods and/ or add new ones
3) Dynamic dispatching,allowing automatic choice of the right object
The first two features offer no special obstacles in a safetycritical environment, and it is worth noticing that the use of thesetwo features is helpful even without dynamic dispatching. For example,a type and associated functions (methods) may be defined in a library.The program can then import this type, extend it to specialize it for aparticular application, and then use the inherited operations on thistype.
In Ada, this useful set of capabilities is recognized by theprovision of a pragma No_Dispatch, whose purpose is precisely to check that a program does not use dynamicdispatching. An Ada compiler can recognize this pragma, and enforce therestriction, as well as improving the code knowing that thisrestriction is in place (for example, by eliminating dispatch tables).Similar switches or pragmas could be implemented in other languages,though they are not part of the standard.
Now let's look at dynamicdispatching.
This offers a challenge. The problem is two-fold. First a typicalimplementation is to have a table of pointers and then index into thistable. That's a bit worrisome. What if the table gets clobbered somehow? The dispatching operation can cause a wild jump. Now of course wedon't expect any such clobbering in a certified program (although thedemonstration of correctness of the dispatch table raises somenontrivial issues). Still, indirect calls make us nervous, since now toensure the integrity of the control flow we have to prove propertiesrelating to data access.
The second problem is more significant. In a sense dynamicdispatching is all about not having to know what routine you arecalling. But certification and coverage testing is all about knowingand checking the control flow of a program. What exactly are wesupposed to check when we see a dispatching call.
One possibility would be to treat each call as though it were a casestatement with branches going to every possible function thatcorresponds to the dispatching call. This seems like a completely fairtranslation, but the trouble is that in a real program, we might easilyfind that most calls can only go to a small subset of the possibletargets. Then we end up with a lot of deactivated code (code that cannever be executed), and we have trouble testing such cases, or provingthat they can never occur.
A simpler approach is to treat the dispatching call as a call to asingle routine that contains such a case statement. In this approachall calls to a given dispatching function will share a single casestatement.
On the positive side, we argue that we could have written theprogram this way in the first place, and traditional testing would havebeen fine, so it should be fine here. On the negative side, we worrythat our coverage testing is only showing that each method is usedsomewhere, and we are not really verifying the possible flows ofcontrol.
The fact that we could have written the program that way is notdecisive. The testing schemes we have described are not perfect, notesting scheme is perfect, but they work pretty well in practice.However, they can be subverted by a programmer concentrating on theletter of the standard, and ignoring its intent.
Here is a way of essentially removing all if statements from aprogram. Given:
if condition then
we replace this with:
Eval_If (condition,then-statementsproc'access ,else-statements-proc'access );
Where the second and third arguments are now pointers to functionsthat if called will execute the appropriate statements. Now Eval_If itselflooks like:
procedure Eval_If (Cond : Boolean; T, E
: access procedure) is