Software Standards Compliance 101: Using coverage analysis to assess test completeness - Embedded.com

Software Standards Compliance 101: Using coverage analysis to assess test completeness

In the mid-1990s, a formal investigation was conducted into a series of fatal accidents with the Therac-25 radiotherapy machine. Led by Nancy Leveson of the University of Washington, the investigation resulted in a set of recommendations on how to create safety-critical software solutions in an objective manner. Since then, industries as disparate as aerospace, automotive and industrial control have encapsulated the practices and processes for creating safety- and/or security-critical systems in an objective manner into industry standards.

Although subtly different in wording and emphasis, the standards across industries follow a similar approach to ensuring the development of safe and/or secure systems. This common approach includes ten phases:

  1. Perform a system safety or security assessment
  2. Determine a target system failure rate
  3. Use the system target failure rate to determine the appropriate level of development rigor
  4. Use a formal requirements capture process
  5. Create software that adheres to an appropriate coding standard
  6. Trace all code back to their source requirements
  7. Develop all software and system test cases based on requirements
  8. Trace test cases to requirements
  9. Use coverage analysis to assess test completeness against both requirements and code
  10. For certification, collect and collate the process artifacts required to demonstrate that an appropriate level of rigor has been maintained.


Phase 9 is discussed in this article. One of the basic truisms of software development is that it is not possible to fully test a program, so the basic question then becomes: how much testing is enough? In addition, for safety-critical systems, how can it be proved to the relevant authorities that enough testing has been performed on the software under development? The answer is software coverage analysis. While it has proven to be an effective metric for assessing test completeness, it only serves as an effective measure of test effectiveness when used within the framework of a disciplined test environment.


When performing exhaustive testing on a piece of software, ensuring that every path through the code is executed at least once sounds like a reasonable place to start. However, examining the possible execution paths in even simple programs soon reveals how difficult it is to test software to completion. For example, in a 2006 lecture on software testing, Professor I.K. Lundquist of MIT described a simple flow chart containing five decision points (including a loop) and six functional blocks that, when analyzed, contained 1014 possible execution paths. When we compare this number to the age of the universe—about 4 x 1017 seconds—the difficulty of complete path analysis becomes clear. As a result, one of the persistent questions when it comes to developing safety-critical software is: When has enough software testing been performed to confirm that the system does what it is supposed to do?

The avionics community has addressed this problem by adopting coverage analysis as the metric of choice for assessing test completeness. As Tom DeMarco, well-known software engineering author and teacher, says: “You can’t control what you can’t measure.”

This article describes and defines the different types of coverage analysis that are used by the avionics community to help assess how completely software has been tested, using the DO-178C standard for developing avionics software (Software Considerations in Airborne Systems and Equipment Certification ) as a reference. The criteria used for selecting which coverage analysis metric(s) are appropriate for a new avionics project will also be discussed. And since coverage analysis metrics do not provide a meaningful assessment of test completeness on their own, this article will also describe how they are used to measure the effectiveness of requirements-based testing in addition to the techniques, methods and tools for performing coverage analysis measurements.

Coverage Analysis
At its most basic, software coverage analysis is a measure of the code structures executed by a test or set of tests. This can be as simple as measuring the lines of source code executed by a given set of tests, to more complex measurements such as measuring the coverage of the object code produced by compiling source code when it is executed on the target system, including measuring whether each branch point in code has been exercised.

The DO-178C standard for developing avionics software specifies three different source code coverage analysis metrics that are used to measure software test effectiveness for avionics software, as described in Table 1. In addition, object code coverage is also required for the most safety-critical systems to ensure that all of the code generated by the compiler is tested.

Table 1: DO-178C Source Code Coverage Analysis Metrics

Coverage Metric

Objective

Statement Coverage (SC)

Ensure that every statement in the program has been invoked or used at least once

Decision Coverage (DC)

Ensure that every entry and exit point in the program has been invoked at least once and that each decision in the program has taken both the TRUE and FALSE outcomes at least once

Modified Condition/Decision Coverage (MC/DC)

Ensure that every condition within a decision has been shown to independently affect that decisions outcome

Next page >>

Coverage analysis is normally reported as a percentage metric. For example, if coverage analysis has been used to verify that 50% of the executable statements in a piece of software have been executed at least once, then a coverage metric of 50% Statement Coverage is assessed against that code. DO-178C requires that 100% coverage be achieved for each of the coverage analysis objectives for the software under development.

click for larger image
Figure 1: An image of LDRAcover showing a flowgraph highlighted to indicate the paths that have been covered, e.g. displays code coverage analysis results in line with system/file/function name. Color coding identifies the decisions, statements and loops that have been executed. (Source: LDRA)

When it comes to choosing which coverage analysis objective to use for a given system, one size does not fit all. For avionic systems, the actual coverage analysis used for a given software system is selected based on the target failure rate for the system. The more essential the software is to the safety of an aircraft, the more rigorous the testing—and therefore the coverage analysis criteria—needs to be.

In the same way that the IEC 62304 standard Medical Device Software – Software Lifecycle Processes defines three different software safety classifications, DO-178C defines five different system safety classifications to match the differing levels of system integrity required for avionics. A system safety hazard is performed on each avionics system, and the impact of a system or software failure on the whole aircraft then determines the system classification, and the overall system target failure rate. Needless to say, the most safety-critical systems are assigned the lowest possible failure rate. The classification of the system then determines what coverage level needs to be achieved, as described in Table 2. In short, the more safety-critical a system is, the more in-depth the required coverage analysis is.

Table 2: DO-178C System Safety Classifications

DO-178C
Software
Criticality Level

Potential Impact of
Software System Failure
on Aircraft

Target Failure Rate
(Failures per Operating Hour)

Coverage Level Required

A

Catastrophic failure condition

10-9

MC/DC, DC, SC and object code coverage

B

Severe-major failure condition

10-7

SC, DC

C

Major failure condition

10-5

SC

D

Minor failure condition

10-3

None

E

No effect on the aircraft

N/A

None

It is a relatively trivial matter to create a set of tests for a piece of software that will achieve the maximum possible coverage. However, testing designed only to achieve high levels of coverage does a very poor job of meeting the testing objectives of safety-critical systems—that is, proving that the system is fit for its intended purpose. As the intended purpose of the system is encapsulated in the system requirements, coverage analysis is only beneficial when used as a means of assessing the efficacy of requirements-based tests, performed at the system level, that are designed to ensure that the software meets its requirements. With appropriate requirements, it is also possible to supplement this testing with tests at the unit level.

click for larger image
Figure 2: An image of LDRA coverage results, showing the results for multiple coverage objectives, e.g. an example coverage analysis report containing a detailed breakdown of the code coverage metrics attained. (Source: LDRA)

Although DO-178C requires 100% coverage from system-level testing, in practice, this is neither appropriate nor necessary. Achieving the maximum code coverage for a project is an iterative process. Using code coverage results as feedback, it’s possible to identify deficiencies in the testing process such as missing requirements, missing test cases, unreachable, unneeded and/or dead/deactivated code. Test cases can then be added, requirements addressed and code refactored to address any identified issues. Testing can then be updated and repeated until the project test effectiveness objectives have been met. This may include accounting for unused code (e.g., when only part of an open source component is used) or augmenting the system-level test results with results from a test harness or even code inspection.

Coverage Analysis Tools
When choosing a tool to help with coverage measurements, it is important to note that not all coverage analysis tools are created equal, and choosing the wrong tool can compromise coverage measurement accuracy—or worse—provide incorrect results. Here are some issues to consider when selecting a coverage analysis tool for an embedded system:

  • What’s the memory footprint of the coverage measurement implementation?
  • Are your embedded system’s hardware and RTOS supported by the tool?
  • What’s the memory footprint of the run-time data?
  • Does your system have enough memory to make meaningful measurements?
  • Will the instrumentation affect the system run-time behavior?

DO-178C provides guidance on these decisions by requiring that any tool used for measuring code coverage be verified to produce accurate, reliable results in the target environment, so that the results it produces can be used with confidence and without further verification. Essentially coverage tool calibration, this process is referred to as “qualification.”

Conclusion
The code quality for any software project can benefit from the application of a few simple guidelines from safety-critical standards such as DO-178C. To control test effectiveness, the impact of testing must be measured using code coverage, using a code coverage level that is appropriate for the testing rigor required for the software. To ensure an appropriate level of testing rigor, all testing must then be requirements-based and performed at the system level. Test, measure, repeat. The feedback, knowledge and understanding required to improve test effectiveness is simply not possible without code coverage analysis. When choosing a coverage analysis tool, using a DO-178C-qualifiable tool ensures results are consistent and complete. By following these guidelines, any software project can achieve the levels of software quality normally expected of safety-critical systems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.