Transporting bugs with virtual checkpoints
Automatic Testing and Checkpointing BugsFinally, let us walk through a more complex scenario where we put bug reporting into a larger context. The flow is illustrated in Figure 2 below.
Figure 2: Distributing checkpoints virtually
We start with a platform team that creates the fundamental software that is used by the developer’s software. This team configures a virtual platform and loads the platform software on it, boots it, and takes a checkpoint after the boot is finished (P). This checkpoint is distributed to developers, testers, and other users.
The developer can use this checkpoint to load and test software. The reporter (who in this case resides in the testing department) takes the developer’s software and adds some testing components and configures the target hardware some more. For example, the reporter might add some application-specific boards and specific test driver software to the target.
Once this setup is complete, another checkpoint is saved(R0). This is then used as the starting point for several parallel test runs of the system (including all software and hardware configuration performed by the platform team and the reporter). While the tests are running, checkpoints are regularly saved and stimuli recorded. When test Q hits a bug, we send the checkpoint of the failing system (RQ
Since checkpoints can be incremental, there can be quite a few checkpoints saved during the execution of test Q. To simplify the package to be sent back to the developer, we apply a checkpoint merge operation before sending the bug report.
The merge combines all the state changes in a chain of checkpoints into a single checkpoint. In this example, the combined checkpoint (R) would still depend on checkpoint (P), since the developer already has that from the platform team and there is no need to duplicate information.

Figure 3. Merging virtual checkpoint state changes
The final situation for the developer is shown in the Figure 3 above. There is no need to redo the system boot or loading. The developer can replay the final steps of the system execution leading up to the bug and investigate the complete system state.
Jakob Engblom is Technical Marketing Manager for Wind River Simics, a full system simulator used by software developers to simulate any target hardware from a single processor to large, complex, and connected electronic systems. He has been working on Simics since 2002, and today works on product planning and how to apply Simics to customer problems. He holds a PhD in real-time systems from Uppsala University and an MSc in computer science, also from Uppsala University. He has written and presented more than 100 articles and talks on a variety of embedded systems topic since 1997.
[1] Full system simulation from embedded to high performance systems, by Jakob Engblom, Daniel Aarno, and Bengt Werner, in “Processor and SoC simulation,” Chapter 3, Rainer Leupers and Olivier Teman (ed), Springer Verlag, 2010
[2] Checkpoint and restore for SystemC models by Màrius Monton, Jakob Engblom, Christian Schröder, Jordi Carrabina and Mark Burton, publishing soon.
[3] Fixing an Intermittent Multi-core Bug with Wind River Simics, Wind River white paper


Loading comments... Write a comment