Building more secure embedded software with code coverage analysis

Editor’s Note:  Excerpted from their book Embedded Systems Security ,  David and Mike Kleidermacher discuss how the use of code coverage techniques can improve the reliability and security of embedded software without necessarily increasing cost or development time.

A comprehensive test regimen, including functional, regression, performance, and coverage testing, is one of the best mechanisms to assure that software is reliable and secure. Indeed, testing is an important component of many high-assurance development standards and guidance documents, such as that promulgated by the U.S. Food and Drug Administration.

In addition, two approaches to testing are almost always required to ensure security. First, all software within security-critical components must be covered by some form of functional test: white-box, black box, fault-based, error-based and stress.. Then coverage is verified using code coverage tools. Further, all security-critical software must be traceable to the software’s component requirements. Software that fails to trace back to a test and to a requirement is more likely to introduce latent security vulnerabilities.

Modified Condition/Decision Coverage
Because code coverage analysis is so important for security, it is worth examining the various levels of coverage testing that can be applied to embedded software.To aid this discussion, we consider the code coverage requirements across the five assurance levels specified in the standard that the U.S. Federal Aviation Administration (FAA) uses to perform safety certification of commercial aircraft.

This standard, published by RTCA, is titled Software Considerations in Airborne Systems and Equipment Certification, commonly referred to as DO-178B. In fact, DO-178B is the most commonly used software safety standard in the worldwide avionics industry. The five assurance levels, in increasing level of criticality, of DO-178B are as follows:

  • Level E: Failure has no impact on flight safety.
  • Level D: Failure impact is minor, noticeable but not critical lto flight safety (e.g.,passenger
  • inconvenience).
  • Level C: Failure impact is major, safety-related but not severe (e.g., passenger discomfort
  • but not injury).
  • Level B: Failure impact is severe (e.g., passenger injury).
  • Level A: Failure impact is catastrophic (e.g., aircraft crash).

The structural code coverage requirements corresponding to each assurance level are shown in the following table:

Table 1: Assurance Level coverage table

DO-178B Level C requires statement coverage: demonstrating that every program statement has been executed at least once (covered) by the verification test regimen. Statement coverage is what most developers equate with the more general term code coverage.

Level B augments statement coverage with decision coverage, a requirement that every decision point in the program has been executed with all possible outcomes. For example, a conditional branch’s comparison both succeeds( branch taken) and fails (branch not taken) at least once each.

Finally, modified condition/decision coverage (MC/DC) augments decision coverage with a specialized form of condition coverage in which each condition within a decision must be shown to have an independent effect on the outcome of that decision. We use a few simple code examples to illustrate the increasing rigor and security-enforcing quality of each coverage approach:

  if (a || b || c) {
   
  }

Statement coverage requires that the if statement is executed and that the code within the if Block (executed on a true decision ) is fully executed.As there are no statements corresponding to a false decision, statement coverage would not require any test cases that force the if block not to execute.

In contrast, decision coverage would require at least one test to execute the false decision path, even though there is no explicit code associated with that path. This extra coverage is desirable from a security perspective because it indicates that the developer has considered the impact of a false decision, which may have some other side effects. Let’s consider this slightly more detailed example:

  uint32_t divisor = 0;
  if (a || b || c) {
    divisor = a | b | c;
  }
  result /= divisor;

The final division statement will fail (divide by zero) on a false decision, but statement coverage testing may never activate this pathway. If an attacker were somehow able to control the decision (e.g., by controlling the values of a, b, and c), then the attacker could cause a denial of service (program crash). Decision coverage testing would have pointed out this problem before it could be fielded.

Condition coverage requires that each condition within the decision be tested with true and false values. The following two test cases will force each of the three conditions to take on both a true and a false value at least once: (a=1, b=1, c=1) and (a=0, b=0, c=0). While testing a decision’s constituent conditions may seem like an improvement over decision coverage, condition coverage is not a superset of decision coverage,as shown in this example below:

  if (a || !b) {
   
  } else {
   
  }

The two test cases, (a=0, b=0) and (a=1, b=1), satisfy condition coverage (both conditions executed with true and false inputs) but neglect to cover the false decision path. Clearly, decision and condition coverage techniques used in concert is desirable.

Multiple condition coverage requires all combinations of conditions. In other words, every row of a decision’s truth table must have a corresponding test case. In the earlier test case with conditions a, b, and c, the truth table is as follows:

Table 2: Truth Table

Thus, multiple condition coverage requires 2n tests, where n is the number of independent conditions. This approach is viewed as impractical; exhaustive condition testing would simply take too many test cases and too long to execute. Languages with short-circuiting Boolean operators (e.g.,C,C++,Java) reduce the number of required test cases:

Table 3: Required test cases table

Nevertheless, compound Boolean expressions may yield an impractical explosion in test cases across realistic programs.The MC/DC compromise
MC/DC is the selected compromise for mosthigh-assurance safety and security standards. MC/DC includes bothdecision and condition coverage. However, in addition, MC/DC requiresthat each condition be demonstrated to have an independent effect on thedecision.

This modified condition requirement is accomplished byvarying a single condition while holding the remainder constant andverifying that the decision changes. Let’s consider the followingexample:

  if (a || b || c || d) {
   
  }

MC/DC requires the following test cases (shown in truth table form):

Table 4: MC/DC Truth table

Theitalicized values below each condition are the test cases that coverboth true and false inputs and result in both a true and false resultwhen all other condition inputs are held constant.For uncoupledconditions, MC/DC requires N +1 test cases, where N is the number ofBoolean conditions.This linear growth in test cases makes MC/DCpractical to implement, and its effectiveness in locating testing gapsas well as design flaws is well regarded and documented.

Westrongly recommend the use of MC/DC coverage testing for the mostcritical components of an embedded system: for example, the operatingsystem kernel, network security protocols, and cryptographic components.

However, MC/DCmay be overkill for some de-privileged applications. The converse isthat system designers should not assume that MC/DC coverage impliesperfect testing. Limitations of (and improvements to) the traditionalMC/DC definition have been reported.

There are numerous otheraspects of a program to test for coverage other than code execution flow.For example, if the program has an enumerated type with five possiblevalues, it would be sensible to validate that all five values are atleast used somewhere in the program. Another problem with code coveragetesting is the loss of fidelity when translating from source to machinecode.

In some cases, it is preferable that code coverage testingbe performed on the Machine code. By doing so, we increase theassurance that malicious code is not instrumented as part of the buildprocess. In fact, machine code coverage is required by somehigh-assurance security and safety certifications. Let’s examine thefollowing simple function:

  int foo(int a, int b, int *arr, int n)
  {
    int i;
    for (i = 0; i < n; i++) {
      arr[i] += a / b;
    }
    return i;
  }

Theloop body contains a divide operation in which numerator anddenominator are both loop invariant. A compiler would like to hoist thisdivide outside the loop to improve performance:

  int foo(int a, int b, int *arr, int n)
  {
    int i;
    int tmp = a / b;
    for (i = 0; i < n; i++) {
      arr[i] += tmp;
    }
    return i;
  }

However,this optimization is disallowed because it changes the function’ssemantics. In the pre-optimized version, the divide may never beexecuted (if the loop itself is never executed, i.e.,argument n iszero). In the optimized version, the divide is unconditionally executed.If the argument b is zero, then the optimization could theoreticallyinduce a program crash that might not otherwise occur. But compilers aresmart! Most compilers will hoist the divide but introduce a guardagainst zero division as shown below:

  int foo(int a, int b, int *arr, int n)
  {
    int i, tmp;
    if (b != 0)
      tmp=a/b;
    for (i = 0; i < n; i++) {
      arr[i] += tmp;
    }
    return i;
  }

Thepreceding code shows the compiler optimization visualized as sourcecode changes. Of course, the compiler performs this optimization as itis generating the machine code. Therefore, the machine code contains anew decision, b! = 0, that was not present in the source code. A codecoverage tool that operates on source code alone would fail to coverthis additional decision.

If a code coverage tool is capable ofproviding decision coverage but not MC/DC, the source can be modified toremove compound conditions. With singleton conditions throughout, MC/ DCreduces to decision or branch coverage:

  if (a || b) {...}

modified to

  if (a)
    if (b) { ... }

Whilethis approach may seem heavy-handed, it can be acceptable because theamount of critical code requiring MC/DC level testing is oftenrelatively small. A lack of complete test coverage almost always pointsto a lack of other important validation (e.g., functional testing),design flaws, or simply latent code that has unknown and potentiallysecurity-relevant impact.

It is left as an exercise for readersto explore other various types of testing and their relative advantages.Beyond the important example of coverage testing, our guidance is moreconcerned with the integration of testing methodology into thedevelopment process to maximize its value.

Organizations that donot follow a rigorous development process often resort to adhoc testingThat is often an afterthought when most of the software has already beenwritten. Organizations that follow a rigorous process often focustesting during a release process, again after much of the software hasbeen written.

It is important to emphasize that in such cases thetesting system should be running 24 x 7. On the other hand, if atesting system is run only on demand, occasionally, or only during arelease process, then errors that can be detected by the testing systemtend to go unnoticed for an unnecessarily long period of time.

Whena flaw is discovered, the developer has a much harder time trying toremediate it than if the flaw was introduced the previous day. In somecases, the developer may have moved on to another project, if notanother company, leaving someone else to try to learn the code and fixthe flaw.

Fixing flaws discovered by the testing system should beprioritized higher than anything other than emergency customer supportissues. keeping the system running cleanly at all times guarantees thattest system failures are almost always new failures that have not beenexamined by anyone else and need immediate attention.

Thetesting system should run on actively developed products as well ascurrently shipping products. when a testing system is used throughoutthe development process, developers are forced to keep the product in aworking state at all times. Software projects that move to rigorous testonly after a code freeze are subjected to test phases that last longeroverall because developers must wrestle with problems insertedthroughout months of development time. When a product is always working,a code freeze leads directly to final quality assurance testing, savingtime to market.

If a developer cannot develop code in a mannerthat prevents he product from failing, then a private branch can be usedas long as it is not allowed to live too long; integrating old codebranches that have drifted far from the trunk often causes unforeseenconflicts that affect the efficiency of the entire development team.

Thetesting system should be able to effectively test a software project inless than one night. A testing system that takes too long to run tendsto become underutilized if not completely ignored. Developers should beable to quickly validate a change overnight before committing it to theproject. In addition, the automated tests running 24x7 on dedicatedtesting compute farms can detect flaws very quickly so they can becorrected while the understanding of the recently added code is stillfresh in the developer’s mind.

It is reasonable to have moretests that can run in one night; however, longer runs should compete at alower priority for computing resources or be run only on demand or atlonger intervals during the development process. The nightly test runmust be good enough to detect almost all flaws added during development.

Testsshould be written such that output is generated only when an error isdetected. A clean Test is one without any output. At worst, the outputshould be less than a page long. Too often, testing systems generatevoluminous output, making it difficult for developers to quicklyascertain the status of the test run. Test output that is difficult toquickly evaluate tends to be ineffectual and ignored.

When a testfails, the exact state of the software system and any inputs or processthat must be used to reproduce the discovered error should be clearlydisplayed within the test output. If the developer is unable toefficiently reproduce a test failure, the test system will tend to beignored. Reproducibility is the key to maximizing the rate at which thedeveloper can remediate flaws discovered by the testing system and bring areliable software product to market faster.

David Kleidermacher ,Chief Technology Officer of Green Hills Software, joined the company in1991 and is responsible for technology strategy, platform planning, andsolutions design. He is an authority in systems software and security,including secure operating systems, virtualization technology, and theapplication of high robustness security engineering principles to solvecomputing infrastructure problems. Mr. Kleidermacher earned his bachelorof science in computer science from Cornell University.

Thisarticle is excerpted from Embedded Systems Security, by David and MikeKleidermacher, used with permission from Newnes, a division of Elsevier.Copyright 2012. All rights reserved. For more information on this titleand other similar books, please visit www.newnespress.com .

1 thought on “Building more secure embedded software with code coverage analysis

  1. In the “hoist the divide but introduce a guard against zero division” example, I get the general idea but note that when b == 0 and i is less than n, the original does a divide by zero which the modified code does not do. (Variable tmp in this case is unde

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.