Dealing with automotive software complexity with virtual prototyping – Part 3: Embedded software testing -

Dealing with automotive software complexity with virtual prototyping – Part 3: Embedded software testing

Editor’s Note: In Part 3 of a series excerpted from the book Better Software. Faster! Victor Reyes looks at how virtual HIL techniques can be applied to the important tasks of automotive software testing and verifying functional safety.

Code coverage measurement and fault injection testing are two very important activities to increase the quality of tests. However, the application of both activities during the embedded software integration and testing phases is typically very limited. On the one hand, code coverage is mainly applied to on-host software unit testing.

Although useful, on-host testing at the unit test level does not exercise big parts of the embedded software running on the device and therefore it is not sufficient to credibly guarantee that faulty software is not being deployed. On the other hand, conventional fault injection techniques that can be applied during software integration and test phases have multiple limitations in terms of intrusiveness and controllability and offer only a limited set of injection points that could help to reach a more extensive coverage.

In the following subsections we will describe how virtual prototypes can overcome some of these limitations, help increase the quality of the tests, and hence the quality of the embedded software.

The ISO 26262 functional safety standard
Functional safety is the part of the overall safety of a system or piece of equipment that depends on the system or equipment operating correctly in response to its inputs, including the safe management of likely operator errors, hardware failures and environmental changes. Its objective is to eliminate the risk of physical injury or other health damage to the end-product users. Any assessment of functional safety must examine the function of any component or subsystem in the context of whole-system behavior.

ISO 26262 [5] is a functional safety standard that replaces the older and more generic IEC 61508 for passenger vehicles. ISO 26262 addresses hazards caused by malfunctioning behavior of electric and electronic safety related systems. The standard focuses on the electrical and electronic programmable systems (EEPS) but requires assurance that functional safety extends to the parts of the system that the EEPS activates, controls or monitors. The standard provides:

  • An automotive specific safety lifecycle — The safety lifecycle is composed of three phases: concept phase, product development phase and after start of production phase.
  • An automotive specific risk-based approach based on Automotive Safety Integrity Levels (or ASIL) – Three aspects define the ranking of an item onto a specific Automotive Safety Integrity Level. Severity, which varies from light injuries to fatal injuries in case of failure of the item under investigation. Exposure, which indicates the probability that the item fails. And controllability, which indicates how difficult it is to control the effects of the hazard by the driver. ASIL D requires the highest safety integrity level for the item and ASIL A the lowest.
  • Requirements and recommended methods for the validation of the safety levels — The standard breaks down the requirements to verify and validate the item under consideration in different sub-parts (system-level, hardware and software). For each of these sub-parts a set of verification methods are described in the standard, with specific weights for the different ASIL levels: no recommendation, recommended and highly recommended. For example, section 4.7 (system design) [6] highly recommends simulation as a method to verify the system for ASIL C and D compliancy.

In general ISO 26262 highly recommends simulation and prototyping methods for system, hardware and software design and verification. When it comes to software integration and testing in particular, the standard highly recommends both fault injection testing and structural coverage metrics for the highest safety integrity levels (ASIL C/D). Fault-injection is more specifically mentioned for system, hardware and software integration and test activities. Fault-injection can be applied successfully to improve the test coverage of safety mechanisms at the system level, covering corner cases that are difficult to trigger during normal operation. It can also be applied whenever a hardware safety mechanism is defined to analyze its response to faults or where arbitrary faults that may corrupt software or hardware components must be injected to test the safety mechanisms.

Code coverage
Code coverage is a very important technique to help achieving higher software quality and cost effective testing, i.e. getting the same results with fewer tests. By using code coverage analysis, it is clear when a new set of tests increases the quality of the results and when not. Many of the added tests can be redundant and go exactly through the same path and conditions exercised by previous tests. Those tests are not adding anything but cost.

Typical coverage metrics are:

  • Function coverage: has each function in the software been called?
  • Call coverage: has each different function call been encountered at least once?
  • Statement coverage: has each statement in the software been executed?
  • Branch coverage: has each branch of each control structure been executed?
  • Decision coverage: has every decision taken all possible outcomes at least once?
  • Condition coverage: has each boolean sub-expression been evaluated both for the true and false option?
  • Modified Condition/Decision coverage: has each decision been tried for every possible outcome? Has each condition in a decision taken every possible outcome? Has each condition in a decision shown to independently affect the outcome of the decision? Has each entry and exit point been invoked?

Code coverage has been mainly applied during software unit-testing using on-host based techniques. This main reason is simplicity and the fact that you can achieve very quick “build-run-analyze” turnaround times. It is also useful when the target hardware is not yet available or when access to hardware is limited. The big problem with on-host based code-coverage is that the software is tested on an architecture that it is not the target architecture and that its scope is limited to individual software components. Therefore, there exists a “credibility gap” when using only on-host based techniques. To minimize this “credibility gap” and to provide better arguments to certification authorities about the quality of the tests, code coverage on-target testing is preferred. With on-target testing, the target architecture and the complete software stack (including hardware dependent software) are used. The problem with on-target testing is however that, in most cases, extra hardware resources are required to handle the book keeping required for code-coverage measurements. Moreover, the embedded software itself needs to be instrumented for the code coverage measurements to work.

Using virtual prototypes to perform code-coverage measurements removes some of these limitations while keeping the benefits of both techniques. Being a software tool, a virtual prototype has a good turnaround time. The exact software stack is compiled for the target architecture, and the code-coverage analysis does not consume any extra hardware resources since the analysis framework is completely orthogonal to the models representing the hardware. As a result virtual prototypes can be used to make testing more cost effective through code coverage measurements with the turnaround and scalability of host-based methods and with the confidence level of target-based methods.

Fault injection
Fault injection helps to determine whether the response of a system matches its specification despite the presence of faults. Faults can be categorized in two big buckets: hardware faults and software faults.

Hardware faults can be categorized by their duration as: permanent faults (triggered by component damage), transient faults (triggered by environmental conditions, also known as soft errors), and intermittent faults (triggered by unstable hardware). Software faults are always the consequence of an incorrect design either at the time of specification or coding.

Conventional fault injection methods are hardware-based, software-based or simulation-based:

  • Hardware-based fault injection is performed at the physical level. This is typically done by modifying the value of the pins of the Electronic Control Unit (ECU) (a.k.a. with contact fault injection) or by disturbing the hardware with electromagnetic interference or heavy ion radiation (a.k.a. without contact fault injection).
  • Software-based fault injection aims to reproduce errors introduced by hardware faults without physically modifying the hardware. Software-based fault injection techniques are limited in that they can only inject errors on those locations accessible by the software, namely the memory and memory-mapped peripherals registers. Therefore, they are only able to model transient faults. The biggest problem with software-based fault injection techniques is their intrusiveness. They modify the software binary by inserting code to cause the errors, which could lead to different behavior compared to the production software running on the end product.
  • Finally, simulation-based fault injection uses an implementation model at gate or RTL level to perform the experiments. Simulation-based fault injection has the advantage of having full access to all hardware elements in the system. Without being intrusive it has full observability and controllability and is fully deterministic. The down side of this level of simulation is that they are extremely slow. This makes them unusable for more complex fault scenarios where software must be taken into account.

Click on image to enlarge.

Figure 8: Comparison of different fault injection methods

When used during the software integration and testing phase, a virtual prototype-based framework for fault injections exceeds the capabilities of conventional fault injection methods. First of all, it provides more visibility and fault-injection points than hardware-based fault injection with contact. Secondly, it is much more controllable and precise than hardware-based fault injection without contact to trigger soft-errors. Thirdly, unlike software-based fault injection, this framework is completely non-intrusive. And finally, it runs orders of magnitude faster than RTL/gate level simulators enabling more complex fault scenarios across the complete software stack.

Figure 9: A virtual prototype based framework for fault injection

A fault injection framework is typically composed of a fault injector,which inject faults from a library on the target system, a set ofworkload generation capabilities (to create the stimuli to test thescenarios), a data collector and an analysis framework (to monitor andfeed information back from the target system)The entire testing isorchestrated by a controller.

A fault injection framework using avirtual prototype benefits from the typical generic control andinspection interface exposed by the models and tool. This interfaceallows inspecting and modifying internal and boundary values of thetarget system (e.g. component registers, memory content, signals, etc).This interface is also used to control the simulation execution. Basedon this generic interface a higher level fault injection API can bedefined to inject the faults. It basically needs to be able to define“triggers” and “faults”.

Using a virtual prototype, triggers canbe software events (for example entering a specific software routine),hardware events (for example when an interrupt lines goes active), timeevents (for example after 1 second of simulation time) or anycombination of the above. Triggers can be concatenated and dynamicallyenable other triggers based on the system status. This increases theprecision of the time when a fault can be injected.

Faults canbe injected at I/O pins, registers, internal signals and memorylocations. The value can be set just once (transient) or can be forcedpermanently. Besides these two commands, model dependent commands can beadded for specific purposes. For instance, a memory can implement aspecific command to flag an ECC error after a read/write access.

Forworkload generation the fault injection framework can also use thevirtual prototype’s control and inspection interface to introducestimuli on the target system. When more complex workload generation isrequired the framework can integrate external plant models or “rest bus”simulation and connect them to the virtual prototype-based targetsystem. The virtual prototype-based framework provides built-inmonitoring, tracing and analysis views of all hardware and softwareelements in the virtual target system. The user can drive scenariosthrough the tool GUI and console in an interactive manner.Alternatively, complete scenarios can be described using the scriptinginterface and re-played automatically as part of regression testing.

Anexample of injecting a soft-error is explained below. In this scenario,data on the SRAM memory will be corrupted during software execution,resulting in the software going into an exception and jumping to anerror routine. The data abort exception will be triggered by themimicked ECC functionality on the SRAM model. The complete scenariofollows three steps: 1) the MCU is running a software application forsome time, 2) wait until the processor core goes into a standardinterrupt service routine (triggered by the internal interrupt line),and 3) finally wait for the next access to the SRAM memory to trigger anECC error back.

The scenario is described using the scripting capabilities of the framework and the fault injection API. The script is show in Figure 10 .Step 1 creates a trigger sensitive to a time event that will invoke aprocedure named “trigger_on_timeout” after 10 ms. This procedure willdynamically create a second trigger linked to the processor coreinterrupt. This second trigger calls the procedure “trigger_on_ISR” whenthe processor receives an interrupt, which in turn invokes a specialcommand implemented on the SRAM memory model to generate an ECC error onthe next access. All three commands can be written down in a script ofless than 10 lines of code.

Figure 10: Script example for an ECC fault scenario

Theeffect of this fault will be that the core enters into a softwareexception routine that jumps to an error routine. This error routinewill shut down the operating system in a controlled manner. Theinjection of the fault and the effects it causes on the software can betraced using the built-in monitor and analysis capabilities of thevirtual fault injection framework.

This is illustrated in Figure 11 ,where we can see that after 10 ms (step 1) and after the next processorinterrupt is received (step 2), the next access to the SRAM memory(accessing the VTABLE) triggered an ECC error (step 3). The zoomed inarea shows more details of the sequence of functions and instructionsthat the software followed to shut down the OS in a controlled mannerafter the error is detected. This procedure typically includes sendingsome diagnostic information to make sure that the cause of the failurecan be communicated to a diagnostics tool and perhaps a recoveryfunction like resetting the system to bring it back to a safe state canbe executed.

Click on image to enlarge.

Figure 11: Detailed trace of the ECC fault scenario

Parallel regression testing
Atthe time of publishing this book, automotive companies rely onHardware-In-the-Loop (HIL) to perform the majority of their embeddedsoftware testing. HIL testing is a sequential process that is becoming amayor bottleneck during the development of automotive software. Runninghundreds of thousands of tests required for a software “variant” takesdays or even weeks to complete.

Although more and more of thetesting is automated with the help of automotive specific testing tools,there are still a lot of tests that have to be applied manually or thatrequire human supervision. This is due to the difficulty to automatecertain actions on real hardware or because some results can only beobserved through special equipment, e.g. oscilloscopes. Scaling thisapproach to reduce the testing time is not easy since the HILinfrastructure is costly. Not only do the HIL boxes themselves cost alot; so does the space to store them, the overall energy consumption torun the boxes and the maintenance cost associated with hardware labsdrive up the bill further.

As discussed earlier in this seriesmoving from HIL testing to a full virtual environment based on virtualprototypes, what we have called vHIL, is the next natural step in theevolution of testing methods. By doing so, automotive companies are ableto better scale their testing by applying massive parallelization, andtherefore reduce the time to completion in a much more cost effectiveway. Being simulation-based, a testing method like vHIL can bereplicated as many times as the computer infrastructure supports.Nowadays, cloud computing has lowered the cost of computational power toa fraction of what it used to be.

The ability to completehundreds of thousands of tests in mere hours instead of weeks will opennew possibilities. For instance, it will allow applying massivefault-injection tests that will result in higher software quality. Anexample of this is the work done by Hitachi. They were able to runapproximately 700,000 tests in one night (12 hours), using a parallelvHIL environment (consisting of 600 simulations), hosted on a publiccloud computing infrastructure. [8][9]

Another opportunity withgreat potential is to link this type of massive parallel testing withsoftware version control tools. Today, adding or modifying code in astable software platform may take days until all dependencies are sortedout. Even then, there is a high risk that a code modification willintroduce bugs in another “variant” that can only be detected laterduring HIL testing. At that point it will be almost impossible to traceback to the code modification that triggered the bug. All these problemswould disappear if after submitting new code, all tests for allvariants could be run overnight. And this is not only valid for theapplication software, but also for the complete software stack that getscompiled for the target microcontroller and will eventually execute onthe ECU.

Part 1: Virtual HIL development basics
Part 2: An AUTOSAR use case

Victor Reyes , Technical Marketing Manager, Synopsys Inc., is the author of a chapter in Better Software. Faster! from which this series of articles was excerpted. Edited by TomDeSchutter, the book was published by Synopsys Press and can bedownloaded as a free eBook.

[5] ISO 26262 functional safety standard, 2011
[6] ISO 26262 Part 4: Product development at the system level
[7] Reyes, V.; “Virtualized Fault Injection Methods in the Context of the ISO 26262”, SAE International, 2012
[8]”Model-based Fault Injection for Large-Scale Failure Effect Analysiswith 600-Node Cloud Computers”, Y. Nakata, Y. Ito, Y. Takeuchi, Y.Sugure, S. Oho, H. Kawaguchi and M. Yoshimoto.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.