Agile development of real-time systems -

Agile development of real-time systems


In this Product How-To Henk Muller of XMOS uses the company’s dual Xcore architecture to illustrate his argument that given a predictable underlying processor architecture agile development is very well suited to real-time software.

Agile development is the process of continuously cycling through the whole software development process to incrementally, quickly and visibly develop a working piece of software.

In this article we argue that given a predictable architecture, agile development is very well suited to develop real-time software. Without a predictable architecture, one of the iterations of the development process is likely to introduce subtle timing errors in components of the software that were not touched.

In an environment where software components can be composed with known effects on the timing, and the tools check timing constraints because the underlying architecture has known performance characteristics, teams of developers can make iterations through the software without breaking working components.

Agile development methods
Agile development methods are established techniques to develop software projects. A key element of agile development is quick successive iterations through the software development process. Each iteration through requirements, implementation, unit testing and user testing leads to a working product that is refined in the next iteration.

Agile development methods are attractive because they are more flexible in situations where product specifications are not fixed, and where time to market is crucial. Traditional examples where agile development methods work are web applications, compilers, or mobile phone applications.

In all cases, the ability to evaluate successive prototypes, and use the information gleaned to steer subsequent prototypes, is invaluable. For example, whether the user-interface is right in the case of the web applications, or whether the right optimizations are implemented in the case of a compiler.

It is essential that the prototypes that are developed in the process of an agile development method are functional prototypes. They may have limited functionality, but the offered functionality is bug-free, and hence the product can be used for testing purposes. This makes the agile development methods different from “hacking” together a prototype.

Real-time systems. Real-time systems come in many shapes and forms. The distinguishing characteristic of a real-time system is that, in addition to functional requirements, timing requirements must be met for the system to operate correctly. The level at which the timing requirements are to be met differs wildly between systems; both in terms of granularity of the timing requirements and how strict the requirements are.

The most precise time requirements are systems that interface directly with other hardware. For example, when reading data from a CMOS optical sensor, pixels are required to be read at a clock of, for example, 25 MHz.

This means that a pixel has to be read every 40 ns. One level less precise, an audio system will process audio samples at a rate of 48 kHz – or one audio sample every 20.8 us, almost three orders of magnitude slower. Another three orders of magnitude slower, a video player will play a frame at a rate of 30 fps, or one frame every 33 ms.

The examples above are multimedia systems, and the consequences of missing a sample vary but are unlikely to be disastrous. In the case of a single missed pixel, one line in the image will be slightly distorted; a single missed audio sample will probably result in an audible click; and a single missed frame will probably cause a slight flicker on the screen.

Having said that, if the system in question is a live concert, then missed samples will result in a click for 100,000 listeners.

Other real-time systems that miss a real-time deadline may face more serious consequences. A motor control system that misses a deadline may cause the motor to vibrate, and an avionics system that misses deadlines may make an aircraft unresponsive. In these cases, the real-time requirements are strict.

Where Agility meets (clashes) with real-time
Real-time systems are often developed using a classical development method. The reasons for this are two-fold and centre around the fact that real-time systems have a strictly defined set of interfaces.

Because the interfaces and the real-time behaviour are defined so strictly, there is on the one hand no need for an agile approach (in that a real-time system has a fixed functionality that has been defined long before the project started), and on the other hand there is little scope for an agile approach (in that it is not easy to implement a subset).

This last point is also a consequence of the fact that many real-time systems are intricately designed software functions and hardware components, where small changes to either may violate the real-time properties, thereby rendering the product useless.

However, we argue that both those points are invalid, and that there is both scope to use an agile development method, and that there is a benefit from using an agile development method. Our argument relies on building systems using a predictable architecture, and with support from tools that predict and check timing properties of the system under development.

Predictable architectures
We define an architecture to be predictable when the programmer can reason about timings of a program written for that architecture. Most modern architectures are very unpredictable. Standard architecture items such as caches for program and data, pipelines with pipeline hazards, interrupts and shared memory make it difficult to predict how long a section of code will take to execute.

Each of those features can affect timings by an order of magnitude. Even though a worst case execution time (WCET) prediction may be feasible, it is usually so far off from typical timing behaviour that using the WCET will result in an over-engineered and uneconomical system. Operating systems running in conjunction with the program may worsen predictability by descheduling a process, or by granting exclusive access to a resource to another process.

Predictable architectures, such as the XMOS XCore, enable the programmer to reason about the timings of their code with tight bounds. On a predictable architecture, the difference between best- and worst-case are typically only caused by data-dependent behaviour of algorithms.

The programmer may need a tool to help them to make the prediction, but there is a close relationship between the source code of the program, and the required time to execute. Innocent small changes to the source code will not lead to a big change in timings on a predictable architecture. This in contrast with a pipelined or cached architecture, where any change could shorten execution time by an order of magnitude.

Since shared memory, interrupts, and process switching cause unpredictable timing, predictable equivalents have to be offered for use by programs on a predictable architecture.

The predictable equivalent of process switching is true concurrency: either in the form of multi-core processing where processes that run on different cores are truly concurrent and are not scheduled, or in the form of hardware threads that are switched on an instruction by instruction basis, which offers a predictable alternative to true concurrency.

Clickon image to enlarge.

Figure 1: Interrupts in a single threaded machine versus events in a concurrent architecture

Interrupts are traditionally used to serve real-time tasks – but at the same time they make real-time reasoning hard for the remainder of the code. On a predictable architecture the replacement is events, where a program responds to events in well defined places in the program.

Event driven programming is a method that is decades old, but using it at architecture level offers the flexibility to deal with real-time requests in such a way that it does not randomly interfere with other code segments.

This is shown in Figure 1 above. Traditionally, an interrupt forces a register save and restore, and causes a task to be interrupted (in this example C interrupts A, causing unpredictable behaviour in task A).

In a predictable machine with multiple threads and events, a thread waits for an event which makes it explicit that one of multiple tasks may happen at a known point, avoiding register save and restore and making timing behaviour explicit.

Shared memory is unpredictable since access to it must be regulated. If two processes exchange data through a piece of shared memory, then a process should not read the shared data while the other process writes in that location – if it does it could read inconsistent data.

Instead of using shared memory, we assume that all threads use memory private to the thread only, and that threads communicate by means of messages passed over channels.

An obvious timing dependency is that a thread will be blocked if it waits for a message, but this is the only place where the programmer has to worry. Indeed, the programmer knows which thread will be waiting for which, and can build an argument (or even proof) that threads will not unduly wait for each other.

Agile development of real-time systems
Earlier we raised two issues on whether agile development is desirable and whether it is feasible. Before we argue on how to effect agile development of real-time systems, we first argue why agile development is a good idea.

Even though some real-time systems have well-defined requirements from the outset, many do not. As an example consider a dock for music players. There are well-defined real-time requirements in terms of generating the USB signals (at 11 MHz), USB frames (at 1 kHz), and the delivery of real-time audio data to a word clock (of, for example, 48 kHz).

But interaction between those components may not be well-defined, for example, the relation between a host-defined 1 kHz USB-frame-clock and a device defined 48 kHz word-clock may be non-trivial since host and device may each have their own crystal clock.

In systems like this there are in general three reasons why an agile development method (Figure 2, below) can be exceedingly useful:

1. Standards are often open to interpretation and only settle after years. When following a traditional development method one would implement a complete system, following one interpretation of the standard, and during the final testing the discovery would be made that the implementation of the standard does not interoperate with other implementations.

2. End-user requirements as to how the device should behave will change. Putting a strict deadline on the user-requirements may be an easy way to manage the process, but will ultimately disappoint the end-user.

3 . A special case is the development of systems as part of a standardization process. This, by nature, requires an agile development method, and helps to quickly converge on a standard that is tried and tested.

Clickon image to enlarge.

Figure 2: Agile real-time software development cycle.

In all those cases an agile development method is desirable.In order to effectively use an agile development method, we must enable a quick and iterative development cycle. In the case of real-time systems, this means that the real-time characteristics have to be met on every iteration of the development cycle. This requires:

1. Software to be composable and still meet timing requirements. This means that two software components can be composed together without disturbing each others timing requirements;

2. A timing analyzer that checks real-time properties of the program, in addition to the normal tool chain used in the development process.

A concurrent architecture with predictable timing by its very nature presents us with composability. Two threads with known timing properties can be scheduled concurrently retaining their timing properties.

Within a thread, a timing analyzer establishes the timing properties of that thread and checks those against the required timings specified by the system designer. (Just like the compiler checks types and function arguments. )

This verification process is only feasible if the underlying architecture has predictable timing properties – but it leads to a dramatic shortening of the testing process since many timings of the program are either known to be met, or known to be failing.

The time that it takes to go through a compilation and timing analyser is in the order of seconds or minutes, as opposed to the hours that unit testing and a full regression may take.

What this means for agile development is that a team of programmers can spin through an iteration cycle. If they break timings they will know quickly, and they can correct those timing deficiencies, until they are confident that all timings are met. It reduces the number of (time consuming) regressions, and the remaining tests can focus instead on complex timing and functional properties.

In order to speed up regression tests, plug-ins can be made for the simulation environment that enables the most important regressions to be ran automatically in a simulation environment, requiring only a small number of tests to be executed on real hardware.

Case Study: developing an IEEE AVB stack
As a case study we present how we use an agile development process for the AVB protocol stack. AVB is the Audio Video Bridge over Ethernet that is in the process of being standardised (IEEE 802.1 and IEEE 1722/1733). Figure 3 below shows the hardware platform that is used for the development, comprising an Ethernet PHY, an audio CODEC, and a dual XCore processor.

Figure 3: Hardware that runs the AVB software.

The current software architecture is shown in Figure 4 below. Note that each thread is scheduled independently, and has its timing verified independently. During an iteration, one or more threads may be revisited, but in the knowledge that no other threads were affected.

The software design has progressed over a three year period, and gone through a total of eight major iterations; that is an average of one iteration every 4-5 months. Major iterations deliver a working AVB stack, that is tested and then progressed.

The 4-5 months per iteration is an average figure – some iterations take place over a very short period of time. For example, during “plugfests” (global events where AVB developers meet up and test interoperability of their software) a complete iteration is made in the scope of a week.

Clickon image to enlarge.

Figure 4: Thread diagram after eight iterations.

During the week, new requirements are drawn up based on observed interoperability problems, a modified specification is drawn up, one or more threads are modified or implemented from scratch, regressions are ran, and finally testing is performed.

This is only feasible because many tests take place during the compilation process, and because of the predictable nature of the underlying architecture. Developers that are using a traditional development method cannot adapt their software to take into account hitherto unknown subtleties of an emerging standard.

We have argued that Agile development has a role to play in the design and implementation of embedded systems. For Agile development to work in an embedded environment, a predictable architecture is a requirement.

Both because a predictable architecture makes composition of tasks simpler, and because tool support on a predictable architecture can inform the programmer about timing constraints that are or aren’t made.

We developed an AVB audio networking stack using Agile design methods, and hence we can follow the emerging standard quickly. Without an agile method, we would not have been able to complete our AVB design in the same time and budget.

Henk Muller is currently the Principal Technologist at XMOS Ltd. In that role he has been involved in the design and implementation of hardware and software for real-time systems. Prior to that, Henk worked in Academia for 20 years in computer architecture, compilers, and ubiquitous computing. He holds a doctorate from the University of Amsterdam.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.