The successful design of safety-critical systems is difficult and demands significant attentionto detail. Fortunately, an operating system's emphasis on protection and resource guarantees can make the job of application developers less arduous.
Whether you are designing a telecom switch, a piece of medical equipment, or one of the many complex systems aboard an aircraft, certain critical parts of the application must be able to operate under all conditions. Indeed, given the steadily increasing speed of processors and the economically-driven desire to run multiple applications, at varying levels of criticality, on the same processor, the risks continue to grow. Consider a blood gas analyzer used in an intensive care unit. The analyzer may serve two distinct purposes. First, it monitors the level of oxygen and other gasses in the patient's bloodstream, in real time. If any monitored gas reaches a dangerously low or high level, the analyzer should produce an audible alarm or take some more direct, interventionary, action. But the device may have a second use, offering a historical display of gas levels for “offline” analysis. In such a system, data logging, data display, and user interface threads may compete with the critical monitoring and alarm threads for use of the processor and other resources.In order for threads of varying importance to safely coexist in the same system, the operating system that manages the processor and other resources must be able to properly partition the software to guarantee resource availability. The key word here is guarantee. Post-design, post-implementation testing cannot be counted on. Safety-critical systems must be safe at all times.
The following terms are used in this article:
- thread: a lightweight unit of program execution
- process: a heavyweight unit consisting primarily of a distinct address space, within which one or more threads execute
- kernel: the portion of an operating system that provides core system services such as scheduling, thread synchronization, and interprocess communication
Fault tolerance begins with memory protection. For many years, microprocessors have included on-chip memory management units (MMU) that enable individual threads of software to run in hardware-protected address spaces. But many commercial real-time operating systems never enable the MMU, even if such hardware is present in the system.
When all of an application's threads share the same memory space, any thread could-intentionally or unintentionally-corrupt the code, data, or stack of another thread. A misbehaved thread could even corrupt the kernel's own code or internal data structures. It's easy to see how a single errant pointer in one thread could easily bring down the entire system, or at least cause it to behave unexpectedly.
For safety and reliability, a process-based real-time operating system (RTOS) is preferable. To create processes with individual address spaces, the RTOS need only create some RAM-based data structures and enable the MMU to enforce the protections described therein. The basic idea is that a new set of logical addresses is “switched in” at each context switch. The MMU maps a logical address used during an instruction fetch or a data read or write to a physical address in memory through the current mapping. It also flags attempts to access illegal logical addresses, which have not been “mapped” to any physical address.
The cost of processes is the overhead inherent in memory access through a look-up table. But the payoff is huge. Careless or malicious corruption across process boundaries is rendered impossible. A bug in a user interface thread cannot corrupt the code or data of a more critical thread. It's truly a wonder that non-memory protected operating systems are still used in complex embedded systems where reliability, safety, or security are important.
Enabling the MMU has other benefits as well. One big advantage stems from the ability to selectively map and unmap pages into a logical address space. Physical memory pages are mapped into the logical space to hold the current process' code; others are mapped for data. Likewise, physical memory pages are mapped in to hold the stacks of threads that are part of the process. An RTOS can easily provide the ability to leave a page's worth of the logical addresses after each thread's stack unmapped. That way, if any thread overflows its assigned stack, a hardware memory protection fault will occur. The kernel will suspend the thread instead of allowing it to corrupt other important memory areas within the address space (like another thread's stack). This adds a level of protection between threads, even within the same address space.
Memory protection, including this kind of stack overflow detection, is often helpful during the development of an application. Programming errors will generate exceptions that are immediately detected and easily traceable to the source code. Without memory protection, bugs can cause subtle corruptions that are very difficult to track down. In fact, since RAM is often located at physical address zero in a flat memory model, even NULL pointer dereferences will go undetected! (Clearly, logical page zero is a good one to add to the “unmap list.”)System call
Another issue is that the kernel must protect itself against improper system calls. Many kernels return the actual pointer to a newly created kernel object, such as a semaphore, to the thread that created it, as a handle. When that pointer is passed back to the kernel in subsequent system calls, it may be dereferenced directly. But what if the thread uses that pointer to modify the kernel object directly, or simply overwrites its handle with a pointer to some other memory. The results may be disastrous.
A bad system call should never be able to take down the kernel. An RTOS should, therefore, employ opaque handles for kernel objects. It should also validate the parameters to all system calls.
Fault tolerance and high availability
Even the best software has latent bugs. As applications become more complex, performing more functions for a software-hungry world, the number of bugs in fielded systems will continue to rise. System designers must, therefore, plan for failures and employ fault recovery techniques. Of course, the effect of fault recovery is application-dependent: a user interface can restart itself in the face of a fault, a flight-control system probably cannot.One way to do fault recovery is to have a supervisor thread in an address space all its own. When a thread faults (for example, due to a stack overflow), the kernel should provide some mechanism whereby notification can be sent to the supervisor thread. If necessary, the supervisor can then make a system call to close down the faulted thread, or the entire process, and restart it. The supervisor might also be hooked into a software “watchdog” setup, whereby thread deadlocks and starvation can be detected as well.
In many critical systems, high availability is assured by employing multiple redundant nodes in the system. In such a system, the kernel running on a redundant node must have the ability to detect a failure in one of the operating nodes. One method is to provide a built-in heartbeat in the interprocessor message passing mechanism of the RTOS (see Figure 1). Upon system startup, a communications channel is opened between the redundant nodes and each of the operating nodes. During normal operation, the redundant nodes continually receive heartbeat messages from the operating nodes. If the heartbeat fails to arrive, the redundant node can take control automatically.
Figure 1: Redundancy, via system heartbeats
Mandatory vs. discretionary access control
An example of a discretionary access control is a Unix file: a process or thread can, at its sole discretion, modify the permissions on a file, thereby permitting access to the file by another process in the system. Discretionary access controls are useful for some objects in some systems.
An RTOS that is used in a safety- or security-critical system must be able to go one big step further and provide mandatory access control of critical system objects. For example, consider an aircraft sensor device, access to which is controlled by a flight control program. The system designer must be able to set up the system statically such that the flight control program and only the flight control program has access to this device. Another application in the system cannot dynamically request and obtain access to this device. And the flight control program cannot dynamically provide access to the device to any other application in the system. The access control is enforced by the kernel, is not circumventable by application code, and is thus mandatory. Mandatory access control provides guarantees. Discretionary access controls are only as effective as the applications using them, and these applications must be assumed to have bugs in them.
Guaranteed resource availability: space domain
In safety-critical systems, a critical application cannot, as a result of malicious or careless execution of another application, run out of memory resources. In most real-time operating systems, memory used to hold thread control blocks and other kernel objects comes from a central store.
Figure 2: a) Before memory quotas b) after
When a thread creates a new thread, semaphore, or other kernel object, the kernel carves off a chunk of memory from this central store to hold the data for this object. A bug in one thread could, therefore, result in a situation where this program creates too many kernel objects and the central store is exhausted (see Figure 2a). A more critical thread could fail as a result, perhaps with disastrous effects.
In order to guarantee that this scenario cannot occur, the RTOS can provide a memory quota system wherein the system designer statically defines how much physical memory each process has (see Figure 2b). For example, a user interface process might be provided a maximum of 128KB and a flight control program a maximum of 196KB. If a thread within the user interface process encounters the aforementioned failure scenario, the process may exhaust its own 128KB of memory. But the flight control program and its 196KB of memory are wholly unaffected. In a safety-critical system, memory should be treated as a hard currency: when a thread wants to create a kernel object, its parent process must provide a portion of its memory quota to satisfy the request. This kind of space domain protection should be part of the RTOS design. Central memory stores and discretionarily-assigned limits are insufficient when guarantees are required.
If an RTOS provides a memory quota system, dynamic loading of low criticality applications can be tolerated. High criticality applications already running are guaranteed to have the physical memory they will require to run. In addition, the memory used to hold any new processes should come from the memory quota of the creating process. If this memory comes from a central store, then process creation can fail if a malicious or carelessly written application attempts to create too many new processes. (Most programmers have either mistakenly executed or at least heard of a Unix “fork bomb,” which can easily take down an entire system.) In most safety-critical systems, dynamic process creation will simply not be tolerated at all, and the RTOS should be configurable such that this capability can be removed from the system.
Guaranteed resource availability: time domain
The vast majority of RTOSes employ priority-based, preemptive schedulers. Under this scheme, the highest priority ready thread in the system always gets to use the processor (execute). If multiple threads are at that same highest priority level, they generally share the processor equally, via timeslicing. The problem with this timeslicing (or even run-to-completion) within a given priority level, is that there's no provision for guaranteeing processor time for critical threads.
Consider the following scenario: the system includes two threads at the same priority level. Thread A is a non-critical, background thread. Thread B is a critical thread that needs at least 40% of the processor time to get its work done. Because Thread A and B are assigned the same priority level, the typical scheduler will time slice them so that both threads get 50% of the processor. At this point, Thread B is able to get its work done. Now suppose Thread A creates a new thread at the same priority level. Consequently, there are three highest priority threads sharing the processor. Suddenly, Thread B is only getting 33% of the processor and cannot get its critical work done. For that matter, if the code in Thread A has a bug or virus, it may create dozens or even hundreds of “confederate” threads, causing Thread B to get a tiny fraction of the runtime.
One solution to this problem is to enable the system designer to inform the scheduler of a thread's maximum “weight” within the priority level (see Figure 3). When a thread creates another equal priority thread, the creating thread must give up part of its own weight to the new thread. In our previous example, suppose the system designer had assigned weight to Thread A and Thread B such that Thread A has 60% of the runtime and Thread B has 40% of the runtime. When Thread A creates the third thread, it must provide part of its own weight, say 30%. Now Thread A and the new thread each get 30% of the processor time, but critical Thread B's 40% remains inviolate. Thread A can create many confederate threads without affecting the ability of Thread B to get its work done; Thread B's processor reservation is thus guaranteed. A scheduler that provides this kind of guaranteed resource availability in addition to the standard scheduling techniques is required in some critical embedded systems, particularly avionics.
Figure 3: Traditional scheduler vs. scheduler with weights
The problem inherent in all schedulers is that they are ignorant of the process in which threads reside. Continuing our previous example, suppose that Thread A executes in a user interface process while critical Thread B executes in a flight control process. The two applications are partitioned and protected in the space domain, but not in the time domain. Designers of safety-critical systems require the ability to guarantee that the run-time characteristics of the user interface cannot possibly affect the run-time characteristics of the flight control system. Thread schedulers simply cannot make this guarantee. Consider a situation in which Thread B normally gets all the runtime it needs by making it higher priority than Thread A or any of the other threads in the user interface. Due to a bug or poor design or improper testing, Thread B may lower its own priority (the ability to do so is available in practically all kernels), causing the thread in the user interface to gain control of the processor. Similarly, Thread A may raise its priority above the priority of Thread B with the same effect.
A convenient way to guarantee that the threads in processes of different criticality cannot affect each other is to provide a process-level scheduler. Designers of safety critical software have noted this requirement for a long time. The process, or partition, scheduling concept is a major part of ARINC Specification 653, an Avionics Application Software Standard Interface.
The ARINC 653 partition scheduler runs partitions, or processes, according to a timeline established by the system designer. Each process is provided one or more windows of execution within the repeating timeline. During each window, all the threads in the other processes are not runnable; only the threads within the currently active process are runnable (and typically are scheduled according to the standard thread scheduling rules). When the flight control application's window is active, its processing resource is guaranteed; a user interface application cannot run and take away processing time from the critical application during this window.
Although not specified in ARINC 653, a prudent addition to the implementation is to apply the concept of a background partition. When there are no runnable threads within the active partition, the partition scheduler should be able to run background threads, if any, in the background partition, instead of idling. An example background thread might be a low priority diagnostic agent that runs occasionally but does not have hard real-time deadlines. Attempts have been made to add partition scheduling on top of commercial off-the-shelf operating systems by selectively halting all the threads in the active partition and then running all the threads in the next partition. Thus, partition switching time is linear with the number of threads in the partitions, an unacceptably poor implementation. The RTOS must implement the partition scheduler within the kernel and ensure that partition switching takes constant time and is as fast as possible.
Meeting hard deadlines is one of the most fundamental requirements of a real-time operating system and is especially important in safety-critical systems. Depending on the system and the thread, missing a deadline can be a critical fault.
Rate monotonic analysis (RMA) is frequently used by system designers to analyze and predict the timing behavior of systems. In doing so, the system designer is relying on the underlying operating system to provide fast and temporally deterministic system services. Not only must the designer understand how long it takes to execute the thread's code, but also any overhead associated with the thread must be determined. Overhead typically includes context switch time, the time required to execute kernel system calls, and the overhead of interrupts and interrupt handlers firing and executing.
All real-time operating systems incur the overhead of context switching. Lower context switching time implies lower overhead, more efficient use of available processing resources, and increased likelihood of meeting deadlines. A real-time operating system's context switching code is usually hand optimized for optimal execution speed.
A typical embedded system has several types of interrupts resulting from the use of various kinds of devices. Some interrupts are higher priority and require a faster response time than others. For example, an interrupt that signals the kernel to read a sensor that is critical to an aircraft's flight control should be handled with the minimum possible latency. On the other hand, a typical timer tick interrupt frequency may be 60Hz or 100Hz. Ten milliseconds is an eternity in hardware terms, so interrupt latency for the timer tick interrupt is not as critical as for most other interrupts.
Most kernels disable all interrupts while manipulating internal data structures during system calls. Interrupts are disabled so that the timer tick interrupt cannot occur (a timer tick may cause a context switch) at a time when internal kernel data structures are being changed. The system's interrupt latency is directly related to the length of the longest critical section in the kernel.
In effect, most kernels increase the latency of all interrupts just to avoid a low priority timer interrupt. A better solution is to never disable interrupts in kernel system calls and instead to postpone the handling of an intervening timer tick until the system call completes. This strategy depends on all kernel system calls being short (or at least that calls that are not short are restartable), so that scheduling events can preempt the completion of the system call. Therefore, the time to get back to the scheduler may vary by a few instructions (insignificant for a 60Hz scheduler), but will always be short and bounded. It is much more difficult to engineer a kernel that has preemptible system calls in this manner, which is why most kernels do not do it this way.
Bounded execution times
In order to allow computation of the overhead of system calls that a thread will execute while doing its work, an RTOS should provide bounded execution times for all such calls. Two major problems involve the timing of message transfers and the timing of mutex take operations.
A thread may spend time performing a variety of activities. Of course, its primary activity is executing code. Other activities include sending and receiving messages. Message transfer times vary with the size of the data. How can the system designer account for this time? The RTOS can provide a capability for controlling whether transfer times are attributed to the sending thread or to the receiving thread, or shared between them. Indeed, the kernel's scheduler should treat all activities, not just the primary activity, as prioritized units of execution so that the system designer can properly control and account for them.
Priority inversion has long been the bane of system designers attempting to perform rate monotonic analysis, since RMA depends on higher priority threads running before lower priority threads. Priority inversion occurs when a high priority thread is unable to run because a mutex (or binary semaphore) it attempts to obtain is owned by a low priority thread, but the low priority thread is unable to execute and release the mutex because a medium priority thread is also runnable (see Figure 4). The most common RTOS solution to the priority inversion problem is to support the priority inheritance protocol.
Figure 4: Priority inversion
A mutex that supports priority inheritance works as follows: if a high priority thread attempts to take a mutex already owned by a low priority thread, the kernel automatically elevates the low priority thread to the priority of the high priority thread. Once the low priority thread releases the mutex, its priority will be returned to normal and the high priority thread will run. The dynamic priority elevation prevents a medium priority thread from running while the high priority thread is waiting; priority inversion is avoided (see Figure 5). In this example, the critical section execution time (the time the low priority thread holds the mutex) is added to the overhead of the high priority thread.
Figure 5: Priority inheritance
A weakness of the priority inheritance protocol is that it does not prevent chained blocking. Suppose a medium priority thread attempts to take a mutex owned by a low priority thread, but while the low priority thread's priority is elevated to medium by priority inheritance, a high priority thread becomes runnable and attempts to take another mutex already owned by the medium priority thread. The medium priority thread's priority is increased to high, but the high priority thread now must wait for both the low priority thread and the medium priority thread to complete before it can run again.
The chain of blocking critical sections can extend to include the critical sections of any threads that might access the same mutex. Not only does this make it much more difficult for the system designer to compute overhead, but since the system designer must compute the worst case overhead, the chained blocking phenomenon may result in a much less efficient system (see Figure 6). These blocking factors are added into the computation time for tasks in the RMA analysis, potentially rendering the system unschedulable. This may force the designer to resort to a faster CPU or to remove functionality from the system.
Figure 6: Chained blocking caused by priority inheritance
A solution called the priority ceiling protocol not only solves the priority inversion problem but also prevents chained blocking (see Figure 7). In one implementation scheme (called the highest locker), each semaphore has an associated priority, which is assigned by the system designer to be the priority of the highest priority thread that might ever try to acquire that object. When a thread takes such a semaphore, it is immediately elevated to the priority of the semaphore. When the semaphore is released, the thread reverts back to its original priority. Because of this priority elevation, no other threads that might contend for the same semaphore can run until the semaphore is released. It is easy to see how this prevents chained blocking.
Figure 7: Priority ceilings
Several RTOSes provide support for both priority inheritance and priority ceilings, leaving the decision up to the system designer. Ada's protected objects implement the priority ceiling protocol.
Many of the real-time operating systems in use today were originally designed for software systems that were smaller, simpler, and ran on processors without memory protection hardware. With the ever-increasing complexity of applications in today's embedded systems, fault tolerance and high availability features have become increasingly important. Especially stringent are the requirements for safety-critical systems.
Fault tolerance begins with processes and memory protection, but extends to much more, especially the need to guarantee resource availability in the time and space domains. Kernel support for features like the priority ceiling protocol give safety-critical system designers the capabilities needed to maximize efficiency and guarantee schedulability in their systems.
David Kleidermacher is director of engineering at Green Hills Software, where he has been designing compilers, software development environments, and real-time operating systems for the past ten years. David has a BSCS from Cornell University. You can e-mail him at .
Mark Griglock is engineering manager for safety-critical products at Green Hills Software. He developed avionics system architectures and run-time systems for over seven years as a member of an SEI CMM Level 5 organization. Mark has an MSCS from Rensselaer Polytechnic Institute and a BSCS from King's College. His e-mail address is .