CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Linux for Real-Time Systems: Strategies and Solutions



TechOnline

The Linux operating system, specifically the Linux kernel, was designed and tuned by its creator, Linus Torvalds (and others) to provide a combination of throughput and fairness in the allocation of CPU resources. But, as Linux continues to move into industrial and embedded computing environments, other requirements come to the fore. Specifically, there is a growing need for guaranteed response to external events. Also, the application designer may need to rigorously control the allocation of CPU resources.

Three critical areas of kernel design impact responsiveness:

  1. External interrupt management
  2. Process (thread) preemptability
  3. Process (or thread) scheduling.

Here, the term "process" refers to the primary schedulable entity in the Linux kernel. Many operating systems directly schedule threads as entities separately represented in kernel data structures, and, in those cases, the term "thread" would be appropriate. However, because Linux uses most of the internal data structure of a process to represent a single thread, "process" is more appropriate.


External Interrupt Management
When an external interrupt occurs, an external interface and/or device needs attention or service. A system considered real-time must respond within a critical time window to many types of external interrupts. Hard real-time systems are those where failure of the system to respond by a particular deadline results in catastrophic failure of the control system. In other words, the deadline (whatever it may be) cannot be missed.

Soft real-time systems are those where the system under control degrades with increasing delays in responding to interrupts, but doesn't suffer a catastrophic failure. So missed deadlines, while not desirable, are allowable (because the event can be retried and/or recovered).

An important issue in external interrupt management is the use of an interrupt-off processor mode. Such a mode can be used to ensure the kernel is not reentered—when the hardware acknowledges an external interrupt and then executes a software interrupt handler. Reentrancy is a problem relative to critical sections of code execution, which must be executed from beginning to end without intervening operations to ensure correctness. A simple example of a critical section occurs when we increment a global integer variable—implemented in most machine architectures as a load from memory, an increment in a register, then a store back to memory. Intermixing these three operations across multiple execution streams will cause errors in the value of the stored integer.

Kernel architectures are frequently designed so that particular data structures and operations are off limits to interrupt handlers—so that the data structures and operations do not need protection from interrupt handlers. For the most part, this is the case in the Linux kernel. As a result, interrupt-off periods are not used extensively, but they are used and they do impact worst-case delays between the arrival of an external interrupt and initiation of the interrupt handler.

Another critical design issue for interrupt management is the ability of the system to nest interrupts—to accept a higher priority interrupt while still servicing an interrupt. By providing such a feature, the system designer can specify the relative priority of external event servicing, and ensure that critical external operations get priority of service by interrupt handlers. Linux does allow nested interrupts, and most hardware allows the priority of different interrupt types to be specified in some way.


Process Preemptability
Many systems have relatively simple software for real-time control; it can be entirely structured into the interrupt service routine associated with a particular I/O interface or device. But other control systems require more complex data-processing and/or control logic running with real-time constraints. Such systems employ user-level processes (single-threaded or multi-threaded), which are scheduled (directly or via signals) from external interrupt handlers. This type of architecture may cause system delays before executing the real-time process that must be run in response to an external event.

A simple approach to this problem would be to suppress or eliminate other system activity. But that would be unrealistic in most cases. A system complex enough to require real-time processes at the user level will also include non-real-time (or lower priority) processes for other operations. Examples of these operations include data processing, event logging, user interfacing, or other less time-critical applications. These processes may be executing when a high-priority external event generates an interrupt, causing the interrupt handler to schedule a user-level process that has a priority higher than the executing process.

If the interrupted process runs in user mode, the kernel can (and usually will) switch context immediately to the newly executable highest-priority process after exiting the interrupt service routine. (However, as we shall see later, Linux may not do that.) Of course, if the interrupted process executes in kernel mode, then kernel design with respect to preemptability becomes a major concern.

A truly preemptable kernel can switch from one process running in kernel mode to a different process, which may then enter or continue in kernel mode. A kernel that is not preemptable can safely execute critical sections of code (e.g. an increment operation on a global integer variable) without risk of intermixing such operations and creating errors. Once again, we assume the exclusion of external interrupt handlers operating on data manipulated in the critical section.

A preemptable kernel must protect all critical sections from reentrancy. This protection can be achieved in various ways. One approach turns off all interrupts during execution of the critical section, and enables the interrupts again upon exit. A second approach uses semaphores—locking on entry to the critical section, and unlocking on exit. An attempt to enter a locked critical section results in a blocked running process. Rescheduling and re-dispatching follows after the locking process continues execution and exits the critical section.

A third hybrid kernel design (or a retro-design) allows kernel-mode execution to be preempted only at fixed, known safe points, typically called preemption points. A simplistic approach to improving kernel responsiveness in a non-preemptable kernel places preemption points at roughly-equal spacing—thus reducing delays in initiating context switches to higher-priority processes.

Finally, a fourth hybrid design uses a context-switch lock around critical sections. Such a design sets and clears a global flag upon entry and exit for critical sections. A set flag tells the process scheduler that a context switch is not allowed—requiring a straight return from interrupt, even if a higher-priority process was scheduled by an interrupt service routine. When the process holding the context-switch lock releases it, the presence of a higher-priority executable process must then be noted—allowing an immediate context switch to occur.

Each of the kernel designs has a specific impact on process-level response to external interrupts.

A non-preemptable kernel raises two concerns:

  1. What is the worst-case path length (in terms of execution time) through kernel code? This time will constitute the majority of the worst-case delay in running a newly executable real-time process (scheduled by an external interrupt handler).
  2. The second concern involves general responsiveness. In a soft real-time system, a non-preemptable kernel will have generally longer and more frequent context-switch delays—because every kernel entry by a process initiates a potential delay.

In a kernel with preemption points, the issues are very similar to those for the non-preemptable case. The maximum path length between preemption points corresponds to the maximum kernel path length in a non-preemptable kernel. Similarly, typical path lengths between preemption points are equivalent to typical kernel path lengths in a non-preemptable kernel.

In both a preemption-point approach and a non-preemptable kernel approach, it becomes difficult to maintain any given specification of responsiveness when code additions or modifications are executed. It requires frequent and careful repeats of measurements, and/or tight control over modifications.

In the preemptable kernel design, the issues are quite different. With interrupt management (off/on operations around critical sections), we must consider the maximum interrupt-off period, the number of critical sections, and the typical length of such sections. Here again, re-measurement and/or very careful engineering must be applied whenever code inside critical sections is modified, or when new critical sections are added. Note that any system modification may cause timing changes—sometimes huge ones—due to the impact on data, code, or address-translation cache content.

If a semaphore technique is used for critical-section management, there will be no delay in initiating a context switch to a newly executable higher priority process. However, such a process may run into a locked critical region necessitating two extra context switches before continuation: a switch back to the process holding the lock, followed by release of the lock, followed by a switch back to the higher-priority process. The frequency with which common locks (to lock common data structures or critical sections) are used, and the time periods for which they are held, are the critical factors in determining effective process response. Latency problems can be avoided if the real-time process does not need to use kernel services in response to external interrupts—or if it is guaranteed to use only kernel services that don't use locks, or if it uses locks unique to itself.

The lock method causes extra delay in context-switch initiation during execution of the critical section. However, it eliminates potential delays later if the new running process attempts to enter the same critical section.


Process Scheduling
The third critical area in kernel design that impacts responsiveness (in this case, process responsiveness) involves the behavioral characteristics of the scheduler. In a real-time control system, the system designer must be able to specify the relative importance of operations—which are then executed by a combination of interrupt-service and process-level software. Process priorities are specified via an API, and in the case of Linux and most current versions of Unix, by way of the POSIX 1003.1b real-time API's (specifically, sched_setscheduler(2)).

The existence of the required API meets just one of the requirements. We also require that the scheduler must actually honor the real-time priorities—by always selecting and dispatching the highest priority process first, regardless of other factors. In a third requirement, the scheduler must select and dispatch the highest priority process in a fixed (deterministic) time period. We refer to such a scheduler as a real-time scheduler.

Note that several other approaches exist for schedulers that include timing attributes. These include rate monotonic scheduling, fair-share scheduling, and others—and such schedulers may be referenced as having real-time attributes. Here, however, we define real-time scheduling as the simple property of selecting and dispatching processes in their specified priority order—and doing so quickly and deterministically.


Real-Time Control with Linux
For a Linux kernel, the design focuses on application throughput—along with overall design simplicity and elegance. As a corollary to throughput, the Linux scheduler also attempts to provide a fair share policy to ensure that all process groups get balanced CPU resources.

These objectives have driven some basic Linux design choices for real-time control applications. We have the following alternatives for the Linux kernel:

  • Use interrupt-off management for critical sections
  • Make it non-preemptable
  • Use a merged-process selection algorithm focused on ensuring aggregate throughput and overall forward progress of all processes
  • Refuse to allow nested interrupts and interrupt prioritization (dependent on the hardware).

For Linux, an interrupt-off technique manages sections of code that are critical relative to interrupt-service-routine operations. Research by MontaVista has identified a problem area in Linux involving termination of a process that has a large number of active child processes. When a loop is executing with interrupts off (with the number of child processes as the iteration parameter) this creates a theoretically unbounded interrupt-off period in Linux. Careful application design can avoid the problem (also MontaVista offers a software patch).

A uniprocessor system usually employs a non-preemptable Linux kernel. For a throughput-oriented system, this is a reasonable design choice. Allowing kernel preemption would mean that many operations become critical sections that need protection using one or more of the methods we have discussed. Such protection then creates overhead—negatively impacting throughput. That is why operating system engineers argue that the design of a kernel for real-time operation directly contradicts the design of a system for throughput.

Note, however, that this reasoning may be flawed—the significance of I/O operations has not been factored into the rationale. Generally speaking, overall throughput is enhanced when the entire computer system is fully used at all times. Simply put, the operating system must keep the I/O channels busy. Processes in an execution mode of short period of execution, initiate I/O, and wait should be given priority over those in a mode of long period of execution.


Suitability of Linux for Real-Time Control
The non-preemptability of Linux raises questions about the suitability of the OS for real-time control applications. Initial measurements by MontaVista of non-preemptable paths in the Linux kernel on a 266-MHz Pentium II processor show path lengths of over 130-ms.

We will not attempt here to explain in detail the specific algorithms used by the Linux scheduler to combine throughput and fairness. Suffice to say that when processes are given a real-time priority, that priority does not provide the primary algorithmic determinant of process selection. Rather, the scheduler fully reviews every executable process. If no real-time processes are encountered, then a combination of factors are used in final selection—with an emphasis on keeping I/O channels busy, assuring that a single process group doesn't overly monopolize the CPU resource, and that process starvation doesn't occur. If executable real-time processes are found, then the highest priority process is dispatched.

The problem of linear overhead in process selection and dispatch (as a function of the number of executable processes) has been a subject of extensive debate. Some argue that real-world systems never have more than a very small number of executable processes. Or, if they do, they are poorly designed systems or improperly overloaded). Others disagree. They point out that real-world engineering teams don't always implement excellent designs. Instead, they create systems that may have large numbers of executable processes, and then expect the scheduler to properly manage the load.

At MontaVista, we believe both views may be correct. As a commercial company, we must meet customer needs, and those needs include having a process scheduler that will select and dispatch a high priority real-time process in a fixed period of time, independent of process loading on the CPU. We cannot just argue away the perceived problem by blaming a customer's flawed design.


Current Linux Solutions
The first approach to solving these issues for use of Linux in a real-time control application is "don't". In other words, design the application solution around the problems. The following approaches are essential:

  • If possible, address all real-time response needs directly with interrupt service routines.
  • Avoid known excessive interrupt-off periods in Linux. I/O drivers can and do turn off interrupts. So the specific drivers should be evaluated for excessive interrupt-off operations, and if problems are found, they must be corrected.
  • If a process component is required in the real-time control path, then consider aggregate system loading. (Might other processes be running when the real-time process becomes executable? If so, what operations could those processes be performing? Is there any risk of a long kernel call in execution? If so, can the application deal with the delay? If not, are there ways to restructure the application to avoid the problem?)

To avoid problems that can occur when a process component is required in the real-time control path, system designers should seriously consider using the MontaVista real-time scheduler. The default Linux scheduler will not guarantee an important process will get immediately and rapidly selected and dispatched.

Another approach to dealing with these issues is to use a hybrid kernel. Specifically, RTLinux or RTAI can be used as an underlying kernel technology, providing hard real-time behavior. Such kernels effectively run Linux as the idle task. The real-time control components of the application are threads executing in the underlying kernel environment. Meanwhile, all non-time critical components can be standard Linux processes. Interrupt off/on operations by Linux are emulated by the lower-level kernel. So Linux semantics are not impacted, while actual interrupts (particularly those handled by the lower-level kernel tasks) are not blocked.

A possible problem with hybrid-kernel approaches is their complexity—from both design and implementation perspectives. Debugging a real-time control system can be difficult enough in a single environment. Also, the question, "Can the real-time component be addressed in a standard Linux interrupt service routine?" really should be considered and addressed before jumping to a hybrid-kernel solution. Again, the key issue with the interrupt-service-routine solution is the impact of interrupt-off operations in Linux. Also we must consider potential needs for more software complexity and interaction in the real-time component than is reasonable to embed in an interrupt service routine.

A third approach addresses Linux issues relative to real-time control applications directly in Linux. There has been substantial work in this area. Important examples include using preemption points in the Linux kernel, a rate-monotonic scheduler developed at UC Irvine, the KURT program at the University of Kansas, and the Linux SRT project.


Improving Linux Responsiveness
MontaVista's approach to providing improved responsiveness in Linux proper is to execute the following plan:

  • Measure the interrupt-off periods, provide the information for the usage of application designers, and if possible, reduce the length of any outliers (see the MontaVista Web page for current measurement information)
  • Provide a priority driven real-time process scheduler with fixed and low overhead for process selection and dispatch
  • Measure process preemption delays, provide the information for the usage of application designers, and if possible, reduce the length of any outliers
  • Develop, assess, and provide a fully preemptable Linux kernel.

The priority-driven scheduler provided by MontaVista is structured as a kernel configuration option. The company recommends this scheduler for any application needing firm control of the execution priority of selected processes. The scheduler selects and dispatches processes with a real-time priority (specified via the sched_setscheduler(2) system call), in priority order. This scheduler extends the real-time priority range from the default Linux range of 0-100 to 0-127.

The scheduler uses an optimized process-selection algorithm, based on a bit-map and a set of 128 executable process queues. Typically the priority of the most important process is known, and the process can be immediately de-queued and executed. On occasion, the scheduler must perform a bit-map search to determine the highest priority non-empty queue. The maximum overhead in such a situation is four tests for non-zero words, followed by a set-bit search through a 32-bit word. Hence, while there is some variability in process selection time, the overhead is fixed—and extremely fast, even in slow mode.

Whenever there is no real-time process to execute, the scheduler drops through to the standard Linux scheduler. The scheduler knows that there is no real-time executable task present, so the overhead incurred is almost zero. There is a very small impact in code size and in the data array of 128 process-list pointers.

During the development of the real-time scheduler, significant defects in Linux were discovered—perhaps not for the first time. The yield (2) system call was found to not always yield to another process (induce a context switch to another runnable process). This can be observed when yield is used as a basis for performing context-switch measurements. The assumption of such benchmarks is that yield always induces a context switch to another executable process (of equal priority). Hence, such measurements can actually measure substantially fewer context switches than expected.

The MontaVista scheduler package includes a correction to the defect in the standard Linux scheduler. A related problem found, and fixed, is that the exhaustion of time slice in a real-time process scheduled with a round-robin scheduling policy does not make it yield to another executable process at the same priority. The company's process preemption delay study is still in progress—results will be shared when the study is complete.

There has been much debate about restructuring the Linux kernel to be fully preemptable. MontaVista intends to explore this as a plausible alternative to dramatically improve Linux process-level response times. The potential of such an approach to improve throughput (by increasing the total parallelism of all operations in the computing hardware complex) will be studied and measured. Note that such an improvement will be highly application-dependent—clearly, a non-I/O oriented load will not benefit. By leveraging the Linux (version 2.4) multiprocessor locks as the critical-section management technique, we believe, though it remains to be proven, that Linux source code needs very little modification to implement such a system.


About the Author

Kevin Morgan has 20 years of experience developing embedded and real-time computer systems for Hewlett-Packard. While at Hewlett-Packard, he worked as an engineer, project manager and section manager spanning the development of five operating systems and was a member of the HP 1000 computer software design team. Most recently serving as HP-UX Operating System Laboratory Manager, Kevin was responsible for overall HP-UX release planning, execution and delivery. He is now working with Linux as the vice president of engineering at MontaVista Software.

1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS



VIRTUALAB
WEBINAR
WEBINAR
WEBINAR




 :