Is it time for another look at how we build safety-critical embedded systems? - Embedded.com

Is it time for another look at how we build safety-critical embedded systems?

Early in March of this year, General Motors Company issued a recall of more than 10,000 Buick Lacrosse and Cadillac SRK vehicles. The issue was not a mechanical problem traditionally associated with such recalls.

In this case, a software defect disabled the driver’s ability to adjust the heading, ventilation and air conditioning which in turn might prevent the defrost system from clearing the windshield. This would decrease a driver’s visibility during inclement weather and may lead to an accident.

Nowadays, most modern cars have about 100 million lines of code (LOC).

The amount of software and code in avionics systems isn’t too far behind. In the F-22 Raptor, the current U.S. Air Force frontline jet fighter, there are about 1.7 million LOC. In commercial aircrafts like Boeing’s new 787 Dreamliner, there are about 6.5 million LOC to operate its avionics and onboard support systems. According to the U.S. Food and Drug Administration (FDA), modern infusion pumps contain more than 100,000 LOC that run software.

Besides the sheer amount of software in these automotive, aerospace and medical systems and devices, an important consideration is the nature of the software. This type of software demands an incredibly high level of correct behavior.

In the GM recall event, the reason that makes the software defect safety-critical is the repercussion that the “bug” might cause. When the defrost systems fails to clear the windshield, the effect might potentially be a fatal car accident.

A failure in an aircraft software system may have similarly devastating effects. Something “going wrong” with a medical device can have unimaginable consequences when the well-being of a patient is involved. Expecting such software to behave as expected is a necessity from the perspective of its users and the people who benefit from it. 


Erroneous Code Examples

Software that does not behave as expected, fails under certain considerations, or does not handle error conditions appropriately are the result of defects in the code. Let’s take a step back and look at what some of these coding errors are and what impact they might have on the systems and devices they are intended to run, including unexpected behavior , failure under certain conditions ,and incorrect handling of error conditions .

int foo() {
    int x = 0;
    x = x++;    // Defect
    return x;   // returns either 0 or 1
}

Example 1: Unexpected behavior

Some constructs of the programming language allow developers to write code whose behavior is left to the compiler, compiler version, and optimization settings to define. The example code above demonstrates the side effect of ordering problems. The right hand side of the assignment is evaluated before the assignment itself takes place.

However, the side effect order is unspecified because the side effect associated with ‘++’ could happen before or after the side affect associated with the assignment. Different compilers will return different values. For example, Microsoft Visual C++ 2008 for 8086 on Windows XP, 32-bit returns 1, while Mingw gcc 3.4.4 on Windows XP, 32-bit returns 0.

In other cases, if a single memory location is written more than once or both read and written without an intervening sequence point, then the program behavior is undefined. Such undefined behavior from software running on safety-critical systems and devices is unacceptable.

int forward_null_example(int *p) {

    int x;

    if ( p == NULL ) {

        x = 0;

    } else {

        x = *p;

    }

    x += fn();

    *p = x;   // Defect: p is potentially NULL

    return 0;

}

Example 2: Failure under certain conditions

Program crashes under certain conditions is another type of error that is unacceptable in safety-critical software. Using pointers in C and C++ is a fairly common practice. However, when not well tested, dereferencing pointers that have not been initialized can lead to segmentation violations and software failures.

In the code example above, a conditional check for ‘p == NULL ’ indicates that the function expects an integer pointer that might potentially be NULL. However, the programmer fails to account for that when dereferencing it later in the program when assigning the value of ‘x’ to it.

Software systems put together by combining various parts require that the interfaces between the various parts are well understood. Assumptions on the part of one programmer may be opposite the assumptions made by his or her counterpart on a companion function. That means that a “safe” value passed from one function to the other may cause a serious or catastrophic program error.

In the example we discussed above, one programmer might assume that passing an uninitialized pointer to the ‘forward_null_example() ’ function is acceptable, though that assumption does not mean it will necessarily be taken as granted by the programmer who wrote the function.

class A {
    public:
        A(){}
        ~A() {
                  throw 7; // Not caught
        }
};
   
int main() {
    A a1;  // Destructor exception crashes the program
    return 0;
}

Example 3: Incorrect handling of error conditions

An area often neglected when doing testing and verification of software is the handling of error conditions. Very often, the act of handling an error condition itself triggers a fault. Notice in the example above, a destructor throws an uncaught exception that will prevent normal termination.

Errors when dealing with destructors or when handling exceptions are not usually tested with traditional testing methods because it requires triggering the error condition first. Regardless of where the crash-causing defect is, whether it is in the normal program execution or in the exception handling code, it has the same severe impact on a safety-critical system.

Standards Driving Software Integrity

Some developers of safety-critical systems understand the challenges and requirements of creating software for automotive, aerospace, and medical systems. They ensure there are processes in place and use innovative tools and techniques to test and verify the software created for these systems and devices.

In some cases, there are mandated compliance and standards. For example, the FDA has realized the critical role that software plays in some medical devices such as infusion pumps. Software governs key aspects of these devices such as the user interface and the actual pumping mechanism to maintain the prescribed infusion rate. In 2009 the FDA mandated the use of the following four verification methods to determine whether the code in these devices is “correct”:

1. Manual code reviews

2. Exhausting testing of the system in which the software is used

3. Simulating the execution of the software on a computer

4. Static Analysis for verifying the correctness of the software

For military and aerospace systems, DO-178B validates that compliant processes are followed in developing the software used in these systems. Third party military and aerospace contractors who develop software for these systems realize the importance and value of DO-178B. More importantly, they understand that DO-178B compliance does not necessarily mean bug free.

What DO-178B assures is that system integrators and software developers are using all the tools and technology at their disposal to test and verify code before program execution. Part of their motivation is the understanding that even developers of non-safety-critical software have adopted static analysis as a baseline to ensure that their software does not crash or operate in an unpredictable manner.

The Motor Industry Software Reliability Association (MISRA) is an organization that produces guidelines for the software developed for electronic components used in the automotive industry. This organization is a collaboration between vehicle manufacturers, component suppliers and engineering consultancies that aim to provide important advice to the automotive industry for the creation and application of safe, reliable software within vehicles.

MISRA guidelines are mainly intended to achieve the following: ensure safety, bring robustness and reliability to the software, ensure that human safety takes precedence when in conflict with security of property, and force developers to consider both random and systematic faults when designing systems.

Conclusion

The nature of software in systems and devices in the aerospace, automotive and medical fields require that, beyond all things, behavior is predictable and that systems do not crash unexpectedly. To ensure these requirements are met, development teams must use the most current and advanced processes, tools and technologies to guarantee the integrity of such software when it is created.

Validation and testing protocols at each design and development stages ensures that no rock is left unturned. If there is one thing that development teams within these verticals have in common is that testing is begun shortly after project inception. The benefits of testing software higher up the development streams ensures that no costly (or fatal) surprises pop up during program execution.

Fortunately, there is a growing list of standards and best practices being put in place to assure that testing is implemented and begun earlier and more often in the software development process. To ensure that safety-critical systems are as bug-free as possible and that they work right every time, all the time, developers must use a combination of tools and processes to ensure only the highest quality software makes it to devices that people’s lives depend on.

Rutul Dave is Product Marketing Manager at Coverity. Hereceived his Masters in Computer Science with a focus on networking and communications systems from University of Southern California. Within nine months into graduate school while learning about creating high-performance networking and distributed systems, he found his passion creating real bleeding-edge technology systems at various Bay Area-Silicon Valley startups like Procket Networks, Topspin Communications and then moving to Cisco Systems. He has years of software development experience in embedded and real-time systems.

His focus these days is on creating tools and technology to enhance the Software Development process and to equip Developers with the best resources, techniques and practices to maximize the integrity of software. When not evangelizing about the benefits of Software Integrity, he scratches the coding itch by developing mobile apps and understanding the Linux kernel.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.