In the mid-1990s, a formal investigation was conducted into a series of fatal accidents with the Therac-25 radiotherapy machine. Led by Nancy Leveson of the University of Washington, the investigation resulted in a set of recommendations on how to create safety-critical software solutions in an objective manner. Since then, industries as disparate as aerospace, automotive, and industrial control have encapsulated the practices and processes outlined in these recommendations into specific industry standards.
Although subtly different in wording and emphasis, the standards across industries follow a similar approach to ensuring the development of safe and/or secure systems. This common approach includes ten phases:
- Perform a system safety or security assessment
- Determine a target system failure rate
- Use the system target failure rate to determine the appropriate level of development rigor
- Use a formal requirements capture process
- Create software that adheres to an appropriate coding standard
- Trace all code back to their source requirements
- Develop all software and system test cases based on requirements
- Trace test cases to requirements
- Use coverage analysis to assess test completeness against both requirements and code
- For certification, collect and collate the process artifacts required to demonstrate that an appropriate level of rigor has been maintained.
Determining failure rates
When software was first introduced to control systems, the creation of software was more of an art than a science as the principles of software engineering had not yet been developed. As the role of software increased to include safety-related applications, concern grew about how to prove that the systems were safe. At the time, the challenges of proving that software and their specifications were correct were not well understood, which added to the complexity of the situation.
The International Electrotechnical Commission (IEC) introduced the Functional Safety of Electrical / Electronic / Programmable Electronic Safety-Related Systems (IEC 61508) standard to help address these concerns. Ratified in 2000, the standard seeks to guide system designers and developers through what they need to do in order to claim that their systems are acceptably safe for their intended uses.
This article focuses on three main areas:
- The approach that IEC 61508 advocates for performing a system safety assessment and how that is then used to determine the target system failure rates;
- The similarities between the IEC 61508 concept of system safety and the concepts of system safety used within the avionics community; and
- The types of tools that can be used to capture the system safety objectives and failure rates and also to ensure that the objectives are followed throughout the software development process.
The roots of IEC 61508 are in industrial control systems. The concept of functional safety encapsulated within the standard is based on the idea that a safety-related system is independent of the Equipment under Control (EUC), including its control system. Where industrial control is concerned, this is reasonable; systems tend to be built of generic components from multiple vendors that are brought together to achieve a specific function.
For example, consider a label applicator tunnel on a conveyor belt as the EUC. The label applicator control system ensures that each label is applied to a carton passing through the tunnel via a robot arm. However, the control system lacks awareness of whether any person can get in harm’s way by putting an arm into the tunnel to clear a jam. The protection system in this case might be based on a light curtain that ensures that the label applicator or the conveyor is stopped whenever an operator’s arm breaks the curtain.
Like other safety-related industry standards, when it comes to safety objectives IEC 61508 does not present an approach for eliminating risk, but instead seeks to reduce it to an acceptable level. As a result, every project starts with a risk assessment to determine any risks associated with the EUC. The probability of each risk occurring then defines what Risk Reduction Factor (RRF) is required to bring the risk to below tolerable levels, from which Safety Integrity Levels (SILs) are then determined per the table below. The maximum RRF required determines the SIL for the whole system:
The SIL is then carried forward throughout the rest of the development process. Developers not only have to achieve the safety goals for their given SIL, they have to prove that the safety goals are met.
Traditional risk reviews have not included network security, but this is changing. Many of the communications protocols on which industrial control system instrumentation rely are Ethernet based, and there is a growing list of associated security threats that create additional risk to those systems. For example, one of the most widely used architectures in industrial control is called SCADA (Supervisory Control and Data Acquisition). Increasingly, the communication between SCADA endpoints is moving towards Ethernet-based solutions, and a quick search on the Internet for “SCADA Vulnerabilities” reveals how many known risks are associated with SCADA communications over Ethernet.
Other safety-critical industries have industrial standards in which the risk of failure is determined at the start of the project. The avionics community, for instance, uses the DO-178C document from Radio Technical Commission for Aeronautics (RTCA) as the reference for developing safety-critical avionics systems. DO-178C requires that development start by determining a software level, from Level A to Level E, which reflects the impact that a system failure can have on the aircraft as a whole. The more likely a system is to cause a catastrophic condition for the aircraft, the higher the assigned software level. Systems whose failure can have a catastrophic impact on safety are assigned Level A, and those systems that have no impact on safety are assigned Level E. Similar to IEC 61508, target system failure rates are associated with each DO-178C software level.Once an SIL is determined for a system, there is a level of developmentrigor required to ensure that the SIL objectives are met. Although manycompanies still attempt to manage this manually, the easiest way tomonitor this is with life-cycle management tools that capture the SILand trace the development rigor used throughout the development process.Auditors can use this verification traceability to audit compliance andgrant certification.
There are two types of tool that shine in this arena. The first focuses on requirements traceability (Figure 1 ).These tools link requirements to all the other components in thedevelopment process from models to lower level requirements, sourcecode, unit tests, system tests, and their associated results. With theability to associate each phase of system development to specificindividuals, these tools help to ensure that the system safetyobjectives flow through the entire software development life cycle.
Figure 1: TBmanager, arequirements traceability tool from LDRA, provides visibility into thelinkages between high-level requirements and how they are traced throughthe development process to lower level requirements, source code, unittests, system tests, and their associated results.
The second class of tool that compliments the requirements traceability tools are tools that document compliance (Figure 2 ).These tools provide detailed templates that document compliance to anindustry standard, walking project managers through the requiredactivities needed to gain approval for each stage of development.
Click on image to enlarge.
Prior to the IEC 61508 standard,software designed for industrial control systems was created on a ‘besteffort basis’, and it was not possible to measure system safetycompliance levels. By introducing a development approach based on asystem safety assessment, IEC 61508 provides an objective means ofcreating safety-critical systems, helping to eliminate theone-size-fits-all approach inherent to ad-hoc software development. Thisis good news for software project managers, because it provides anempirical means of determining the level of development rigor requiredfor a given software project, ensuring that effort is not expended whereit is not necessary.
Jay Thomas , a Technical Development Manager for LDRA Technology ,has worked on embedded controls simulation, processor simulation,mission- and safety-critical flight software, andcommunications applications in the aerospace industry. His focus onembedded verification implementation ensures that LDRA clients inaerospace, medical, and industrial sectors are well grounded in safety-,mission-, and security-critical processes.