Imagine a world without a global notion of time. Now try to find out the flight direction of an airplane with the following information: There's an e-mail from Alice that she saw the plane about two hours after sunrise and another e-mail from Bob that he saw the plane about three hours after sunrise. So Alice and Bob tell us when they saw the plane, at least from their point of view. If they are nice, they might give us some additional information, namely their location at the moment of the observation. But, unfortunately embedded systems are usually not that nice.
Now imagine a distributed system built of networked embedded nodes. When a problem arises with the distributed application, the designer invokes a debugger to find out the faulty system behavior. In detail, the designer traces the execution of two nodes A and B simultaneously. The situation is similar to the plane-tracking scenario. Obviously, a systemwide notion of time would be helpful, which leads us to an important aspect in distributed debugging.
State-of-the-art embedded systems debugging
To alleviate the difficulty of debugging of modern microcontrollers and complex system on chips (SoCs), support for test and debugging is routinely built into silicon. Today, many debugging approaches rely on offline debugging based on trace buffers added to the CPU to reduce intrusiveness by the debug system. Leading processor-core vendors offer on-chip trace solutions.
However, existing debugging and test tools are mainly focused toward single nodes with one or more CPUs on-board or on-chip (SoC) by using auxiliary debug interfaces like JTAG or a simple UART. The problem of these approaches is that they entirely neglect the distributed nature of many applications since to connect a monitoring computer directly to each node is impractical, especially if the nodes are already embedded in their place of installation (see Figure 1a ).
Wouldn't it be nice to precisely coordinate debug, test, trace, and replay activities across the entire distributed system without the use of any auxiliary interface or special cabling? Moreover, it would be helpful if only a single debugging master and monitoring computer is attached to the network, used to issue debugging, test, or monitoring actions. Such an approach that greatly simplifies debugging and testing of distributed systems is shown in Figure1b .
A new solution for distributed debugging
In the following example, a distributed system is assumed that contains a plurality of nodes, where every node is a self-contained processing unit with peripherals or, in other words, an embedded system. The nodes are connected via a network (for example, Ethernet) and exchange data to jointly perform their application tasks. No restrictions shall apply to the underlying network technology, be it wireless or cable-bound, shared, or switched. Today, such setups are deployed in automotive applications, industrial, and building automation, as well as machine and plant control, to name a few.
As already mentioned, a global notion of time is a key element for distributed debugging. Therefore the proposed solution for distributed debugging implements a mechanism to synchronize the local clocks contained in the network nodes. The clock synchronization mechanism can be implemented either in software or in hardware. The latter option, however, provides a better accuracy. If, for example, the IEEE 1588 clock synchronization standard is applied to a 100Mbit/s Ethernet network, the clock synchronization mechanism can be implemented in hardware right above the physical layer, which allows synchronization of the local clocks with a precision of about 10 ns. This enables systemwide debugging of multiple nodes at the CPU instruction level, assuming CPU clock rates up to 100 MHz. If the network provides an implicit clock synchronization mechanism, such as time-triggered protocols like TTP or FlexRay, this clock can be used for debugging purposes.
Besides the clock synchronization mechanism, the second key element in the proposed approach for distributed debugging is the usage of the already existing network to transfer debugging data. During normal operation, the network does not exhibit any additional traffic. Even the clock synchronization, although typically not showing higher traffic than one short message per second (in case of IEEE 1588), can be turned off if not needed. The nodes jointly perform their application tasks (its normal operation) without the need of interaction. To enable debugging and testing, a debugging master or monitoring computer running a debugging program, has to be connected to the network. First, clock synchronization will be activated to enable the common notion of time. During a debugging session, the debugging master transfers messages containing debugging commands to one or multiple nodes. Each node responds to the received messages and performs the intended tasks.
In an integrated approach, dedicated hardware units are added to each node that facilitate test, replay, monitoring, fault injection, or debug actions in the target embedded system (see Figure 2 ). All these local actions are controlled by an offload-engine without having the node's application CPUs to run a single line of additional debugging or test-related code.
Although the proposed solution is aimed for an implementation on hardware level, it's also possible to implement it in software. This would allow a less expensive implementation as no special chip has to be designed but comes with some restrictions and drawbacks. The major drawback is the intrusiveness of the software approach since the debugging software task can significantly change the system behavior.
Benefits of the solution
An unprecedented coordination of the plurality of nodes in the distributed system is reached by deriving the triggers for trace, debugging, or any other activities from the synchronized clocks as well as to enable cross-triggering between the hardware support units of the nodes. This enables complex test or debug scenarios where the sequence of events can be made independent of network packet delays even if the nodes are distributed in space.
An informal collection of some possible scenarios that would greatly disburden the task of testing and debugging in a distributed system could mention:
• Start and stop of code execution
• Activation of breakpoints
• Trace and display of register contents
• Trace of internal bus activity
• Synchronous replay (such as previously recorded data from sensor interfaces)
• Single stepping
• Start and stop of online as well as offline tests or maintenance activities
• Injection of faults
• Precise performance analysis
These actions can be executed on a single node, multiple nodes, or all nodes of the distributed system in a coordinated manner.
In the proposed solution several units are to be added to an embedded network node, typically and most effectively on chip level, to keep the task of debugging in the background and to minimize or cancel the probe effect. This location is also best for the clock synchronization mechanism. A node constructed according to the presented new solution thus contains a clock unit, a test unit, a replay unit, a monitoring unit, a fault injection unit, a debug unit, and an offload-engine, which controls the aforementioned units. The units are connected to the offload-engine via a dedicated on-chip debug bus, which is separated by the system bus for the application CPUs. Furthermore, a network interface is connected to the application CPUs (see Figure 2 ).
The network interface has a built-in filter that detects incoming messages that contain debugging commands or clock synchronization information and directs those messages to the offload engine. Other messages are forwarded to the application CPUs, which are unaware of the filtering mechanism. Regarding transmission the network interface has the ability to insert packets containing information related to debug and test from the offload engine into the transmit buffers during idle times of the network.
The offload engine comprises the subsystem that is responsible for processing messages for clock synchronization and debugging. Its task is to read incoming debugging and clock synchronization packets and to transfer the information to the corresponding unit (clock, test, replay, monitoring, fault injection, or debug unit). The performance of the offload engine can be significantly lower than the performance of the application CPUs since the task of forwarding messages to the units on the debug bus is relatively simple and the clock synchronization calculations are only repeated at a low frequency.
The test unit's task is to perform self-test operations in the node and to offer a network-enabled test access port to the system, for example by implementing JTAG ports to connect to off-chip components. That way, a device contained on a node could be configured through the TAP-master in the test unit, which is, for example, helpful for maintenance purposes.
To reproduce complex debugging setups, it might be useful to replay certain data streams from peripheral units. The task of the replay unit is to record data from external units such as GPIOs (general purpose input/outputs), UARTs (universal asynchronous receiver/transmitters), and DACs (digital-to-analog converters) and to allow playback at later times.
The monitoring unit is connected to the system bus as a passive listener and tracer. It records all or a filtered subset of accesses to the main system bus to allow a detailed history of transactions to be transferred to the debugging master for off-line debugging.
Fault injection is a method to assess system capabilities like fault detection, fault isolation, or recovery. In software testing, fault injection is used to test rarely reached parts of code. If required, such functions can be supported and as a consequence coordinated across the entire distributed embedded system.
The debug unit is designed to connect to the debugging interfaces of the application CPUs. The unit's functions are for instance to halt the processors of one chip, to single step through instructions, to add and remove breakpoints, and to allow access to processor internal registers. The use of a highly synchronized time base from the clock unit allows the debugging operator computer to perform a coordinated halt or single step of the complete distributed system or a remote reset after a nonrecoverable crash of the node.
The application CPUs contained in the nodes are the processing units that perform the actual application task. The (single or multiple) processors are connected to peripheral units like sensors and actuators as well as to the communication controller. Due to the nonintrusiveness of the debugging functionality, the application CPUs are unaware of any extra load caused by commands sent from or received by the debugging operator computer.
The software running on the operator computer is a crucial part, since the hardware support alone does not make much sense without any user interface to operate and control debugging and testing of the distributed system. Currently, the operator software is based on the well known tools GDB (GNU Project Debugger) and Eclipse to offer full state-of-the-art source-level debugging support for all nodes of a distributed system according to the solution we present in this article.
With respect to both CPU clock speed and data throughput, real-time video imaging applications where several high-speed, high-resolution cameras operating in an ensemble, interconnected via 10 Gbit/s Ethernet, are among the most challenging systems to debug. A precisely coordinated approach to debug and test such systems is as innovative as promising.
In telecom applications, network interconnect is increasingly moving from circuit switched or time division to internet-protocol–based solutions often relying on Ethernet network technology. At the same time, requirements for precisely retaining systemwide 2.048-MHz or 4.096-MHz sampling rates have to be fulfilled. IEEE 1588 clock synchronization is therefore currently built into such systems in many places. Including IEEE 1588 would, at the same time, enable corresponding coordination of test and debugging support if taken into account.
Distributed systems are also rampant in industrial automation. Here, real time is often defined with respect to time constants of mechanical systems such as motors, which typically are in the range of several 1 µs. As Gigabit network bandwidth is also beginning to break into this area, CPU clock speeds and application complexity are very likely to be increased accordingly.
Automotive systems have just started to step to higher bandwidth–for example, FlexRay communication technology with 10 Mbit/s data rate. The predicted shift from federated toward integrated architectures will very likely further increase complexity in those systems, as multiprocessor break-by-wire chips are already being designed both due to reliability and performance reasons. A systematic approach to debugging and test, precisely coordinated amongst the collaborating embedded control units promises to greatly ease efforts during system bring-up as well as maintenance.
Testing and debugging of a distributed system is known as a complex task. The presented solution enables the developer to gain control over all the nodes in a distributed system in a coordinated, reproducible, and, if required, a nonintrusive manner. The main difference to existing approaches is the fact that the proposed one takes the distributed character of the system into account. The combination of precisely synchronized clocks with hardware support for test, replay, fault-injection, monitoring, and debugging allows for unprecedented insights in the execution flow of distributed systems.
Roland Höller and Peter Rössler are responsible for R&D projects at the University of Applied Sciences Technikum Wien, Austria. They've worked many years in the area of ESL design, FPGA and digital ASIC design, PCB design as well as clock synchronization and control networks. The work described herein receives support by the City of Vienna, Department MA27 (grant numbers MA27-Project 04-11 and MA27-Project 05-05) and the Austrian Research Agency FFG (grant number 818647).