CMP EMBEDDED.COM

Login | Register     Welcome Guest IPS  Call for Abstracts
 

The lowdown on Embedded Linux and its use in real-time applications: Part 2



Embedded.com

Real-Support for hard real time is not in the mainline kernel.org source tree. To enable hard real time, a patch must be applied. The real-time kernel patch is the cumulative result of several initiatives to reduce Linux kernel latency.

The patch had many contributors, and it is currently maintained by Ingo Molnar; you can find it on the Red Hat site. The soft real-time performance of the 2.6 Linux kernel has improved significantly since the early 2.6 kernel releases. When 2.6 was first released, the 2.4 Linux kernel was substantially better in soft real-time performance.

Since about Linux 2.6.12, soft real-time performance in the single-digit milliseconds on a reasonably fast x86 processor is readily achieved. To get repeatable performance beyond this requires the real-time patch.

The real-time patch adds several important features to the Linux kernel. Figure 17-4 below shows the configuration options for Preemption mode when the real-time patch has been applied.

Figure 17-4. Preemption modes with real time patch

The real-time patch adds a fourth preemption mode called PREEMPT_RT, or Preempt Real Time. The four preemption modes are as follows:

       PREEMPT_NONE: No forced preemption. Overall latency is, on average, good, but there can be some occasional long delays. Best suited for applications for which overall throughput is the top design criteria.

       PREEMPT_VOLUNTARY: First stage of latency reduction. Additional explicit preemption points are placed at strategic locations in the kernel to reduce latency. Some loss of overall throughput is traded for lower latency.

       PREEMPT_DESKTOP: This mode enables preemption everywhere in the ­kernel except when processing within critical sections. This mode is useful for soft real-time applications such as audio and multimedia. Overall throughput is traded for further reductions in latency.

       PREEMPT_RT: Features from the real-time patch are added, including replacing spinlocks with preemptable mutexes. This enables involuntary preemption everywhere within the kernel except for those areas protected by preempt_disable().

This mode significantly smoothes out the variation in latency (jitter) and allows a low and predictable latency for time-critical real-time applications.

If kernel preemption is enabled in your kernel configuration, it can be disabled at boot time by adding the following kernel parameter to the kernel command line:
preempt=0

Real-Time Features
Several new Linux kernel features are enabled with CONFIG_PREEMPT_RT. From Figure 17-4 above, we see several new configuration settings. These and other features of the real-time Linux kernel patch are described here.

Spinlock Converted to Mutex. The real-time patch converts most spinlocks in the system to mutexes. This reduces overall latency at the cost of slightly reduced throughput. The benefit of converting spinlocks to mutexes is that they can be preempted. If Process A is holding a lock, and Process B at a higher priority needs the same lock, Process A can preempt Process B in the case where it is holding a mutex.

ISRs as Kernel Tasks. With CONFIG_PREEMPT_HARDIRQ selected, interrupt service routines (ISRs) are forced to run in process context. This gives the developer control over the priority of ISRs because they become schedulable entities. As such, they also become preemptable to allow higher-priority hardware interrupts to be handled first.

This is a powerful feature. Some hardware architectures do not enforce interrupt priorities. Those that do might not enforce the priorities consistent with your specified real-time design goals. Using CONFIG_PREEMPT_HARDIRQ, you are free to define the priorities at which each IRQ will run.

Conversion of ISRs to threads can be disabled at runtime through the /proc file system or at boot time by entering a parameter on the kernel command line. When enabled in the configuration, unless you specify otherwise, ISR threading is enabled by default.

To disable ISR threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/hardirq_preemption

To verify the setting, display it as follows:
# cat /proc/sys/kernel/hardirq_preemption
1


To disable ISR threading at boot time, add the following parameter to the kernel command line:
hardirq-preempt=0

Preemptable Softirqs. CONFIG_PREEMPT_SOFTIRQ reduces latency by running softirqs within the context of the kernel's softirq daemon (ksoftirqd). ksoftirqd is a proper Linux task (process). As such, it can be prioritized and scheduled along with other tasks. If your kernel is configured for real time, and CONFIG_PREEMPT_SOFTIRQ is enabled, the ksoftirqd kernel task is elevated to real-time priority to handle the softirq processing. Listing 17-3 below shows the code responsible for this from a recent Linux kernel, found in .../kernel/softirq.c.

Listing 17-3
Promoting ksoftirq to Real-Time Status

static int ksoftirqd(void * __bind_cpu)
{
      struct sched_param param = { .sched_priority = 24 };

      printk("ksoftirqd started up.\n");

#ifdef CONFIG_PREEMPT_SOFTIRQS
      printk("softirq RT prio: %d.\n", param.sched_priority);
      sys_sched_setscheduler(current->pid, SCHED_FIFO, &param);
#else
      set_user_nice(current, -10);
#endif
...


Here we see that if CONFIG_PREEMPT_SOFTIRQS is enabled in the kernel configuration, the ksoftirqd kernel task is promoted to a real-time task (SCHED_FIFO) at a real-time priority of 24 using the sys_sched_setscheduler() kernel function.

SoftIRQ threading can be disabled at runtime through the /proc file system, as well as through the kernel command line at boot time. When enabled in the configuration, unless you specify otherwise, SoftIRQ threading is enabled by default. To disable SoftIRQ threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/softirq_preemption
To verify the setting, display it as follows:
# cat /proc/sys/kernel/softirq_preemption
1


To disable SoftIRQ threading at boot time, add the following parameter to the kernel command line:

softirq-preempt=0

Preempt RCU. RCU (Read-Copy-Update) is a special form of synchronization primitive in the Linux kernel designed for data that is read frequently but updated infrequently. You can think of RCU as an optimized reader lock. The real-time patch adds ­CONFIG_PREEMPT_RCU, which improves latency by making certain RCU sections preemptable.

O(1) Scheduler. The O(1) scheduler has been around since the days of Linux 2.5. It is mentioned here because it is a critical component of a real-time solution. The O(1) scheduler is a significant improvement over the previous Linux scheduler. It scales better for systems with many processes and helps produce lower overall latency.

In case you are wondering, O(1) is a mathematical designation for a system of the first order. In this context, it means that the time it takes to make a scheduling decision is not dependent on the number of processes on a given runqueue. The old Linux scheduler did not have this characteristic, and its performance degraded with the number of processes.

Creating a Real-Time Process
You can designate a process as real time by setting a process attribute that the scheduler uses as part of its scheduling algorithm. Listing 17-4  below shows the general method.

Listing 17-4
Creating a Real-Time Process

#include <sched.h>

#define MY_RT_PRIORITY MAX_USER_RT_PRIO /* Highest possible */

int main(int argc, char **argv)
{
      ...
      int rc, old_scheduler_policy;
      struct sched_param my_params;
      ...

     
      /* Passing zero specifies caller's (our) policy */
      old_scheduler_policy = sched_getscheduler(0);

      my_params.sched_priority = MY_RT_PRIORITY;
      /* Passing zero specifies callers (our) pid */
      rc = sched_setscheduler(0, SCHED_RR, &my_params);
      if ( rc == -1 )
            handle_error();
      ...
}


This code snippet does two things in the call to sched_setscheduler(). It changes the scheduling policy to SCHED_RR and raises its priority to the maximum possible on the system. Linux supports three scheduling policies:

       SCHED_OTHER: Normal Linux process, fairness scheduling
       SCHED_RR: Real-time process with a time slice—that is, if it does not block, it is allowed to run for a given period of time determined by the scheduler
       SCHED_FIFO: Real-time process that runs until it either blocks or ­explicitly yields the processor, or until another higher-priority SCHED_FIFO process becomes runnable

The main page for sched_setscheduler provides more detail on the three different scheduling policies.

Critical Section Management
When writing kernel code, such as a custom device driver, you will encounter data structures that you must protect from concurrent access. The easiest way to protect critical data is to disable preemption around the critical section. Keep the critical path as short as possible to maintain a low maximum latency for your system. Listing 17-5  below shows an example.

Listing 17-5
Protecting Critical Section in Kernel Code

...
/*
 * Declare and initialize a global lock for your
 * critical data
 */
DEFINE_SPINLOCK(my_lock);
...

int operate_on_critical_data()
{
    ...
    spin_lock(&my_lock);
    ...
    /* Update critical/shared data */
    ...
    spin_unlock(&my_lock);
    ...
}


When a task successfully acquires a spinlock, preemption is disabled and the task that acquired the spinlock is allowed into the critical section. No task switches can occur until a spin_unlock operation takes place. The spin_lock() function is actually a macro that has several forms, depending on the kernel configuration. They are defined at the top level (architecture-independent definitions) in .../include/linux/spinlock.h.

When the kernel is patched with the real-time patch, these spinlocks are promoted to mutexes to allow preemption of higher-­priority processes when a spinlock is held.

Because the real-time patch is largely transparent to the device driver and kernel developer, the familiar constructs can be used to protect critical sections, as described in Listing 17-5 above. This is a major advantage of the real-time patch for real-time applications; it preserves the well-known semantics for locking and interrupt service routines.

Using the macro DEFINE_SPINLOCK as in Listing 17-5 above  preserves future compatibility. These macros are defined in .../include/linux/spinlock_types.h.

Debugging the Real-Time Kernel
Several configuration options facilitate debugging and performance analysis of the real-time patched kernel. They are detailed in the following subsections.

Soft Lockup Detection. To enable soft lockup detection, enable CONFIG_DETECT_SOFTLOCKUP in the kernel configuration. This feature enables the detection of long periods of running in kernel mode without a context switch. This feature exists in non-real-time kernels but is useful for detecting very high latency paths or soft deadlock conditions. To use it, simply enable the feature and watch for any reports on the console or system log. Reports will be emitted similar to this:

BUG: soft lockup detected on CPU0

When this message is emitted by the kernel, it is usually accompanied by a backtrace and other information such as the process name and PID. It will look similar to a kernel oops message complete with processor registers. See .../kernel/softlockup.c for details. This information can be used to help track down the source of the lockup condition.

Preemption Debugging. To enable preemption debugging, enable CONFIG_DEBUG_PREEMPT in the kernel configuration. This debug feature enables the detection of unsafe use of preemption semantics such as preemption count underflows and attempts to sleep while in an invalid context. To use it, simply enable the feature and watch for any reports on the console or system log. Here is just a small sample of reports possible when preemption debugging is enabled:

BUG: <me> <mypid>, possible wake_up race on <proc> <pid>
BUG: lock recursion deadlock detected!  <more info>
BUG: nonzero lock count <n> at exit time?


Many more messages are possible—these are just a few examples of the kinds of problems that can be detected. These messages will help you avoid deadlocks and other erroneous or dangerous programming semantics when using real-time kernel features. For more details on the messages and conditions under which they are emitted, browse the Linux kernel source file .../kernel/rt-debug.c.

Debug Wakeup Timing. To enable wakeup timing, enable CONFIG_WAKEUP_TIMING in the kernel configuration. This debug option enables measurement of the time taken from waking up a high-priority process to when it is scheduled on a CPU. Using it is simple. When configured, measurement is disabled. To enable the measurement, do the following as root:

# echo '0' >/proc/sys/kernel/preempt_max_latency
When this /proc file is set to zero, each successive maximum wakeup timing result is written to this file. To read the current maximum, simply display the value:
# cat /proc/sys/kernel/preempt_max_latency
84


As long as any of the latency-measurement modes are enabled in the kernel configuration, preempt_max_latency will always be updated with the maximum latency value. It cannot be disabled. Writing 0 to this /proc variable simply resets the maximum to zero to restart the cumulative measurement.

Wakeup Latency History. To enable wakeup latency history, enable CONFIG_WAKEUP_LATENCY_HIST while CONFIG_WAKEUP_TIMING is also enabled. This option dumps all the wakeup timing measurements enabled by CONFIG_WAKEUP_TIMING into a file for later analysis. An example of this file and its contents is presented shortly when we examine interrupt off history.
       CRITICAL_PREEMPT_TIMING: Measures the time spent in critical sections with preempt disabled.
       PREEMPT_OFF_HIST: Similar to WAKEUP_LATENCY_HIST. Gathers preempt off timing measurements into a bin for later analysis.

Interrupt Off Timing. To enable measurement of maximum interrupt off timing, configure your kernel with CRITICAL_IRQSOFF_TIMING enabled. This option measures time spent in critical sections with irqs disabled. This feature works in the same way as wakeup latency timing. To enable the measurement, do the following as root:

# echo '0' >/proc/sys/kernel/preempt_max_latency

When this /proc file is set to zero, each successive maximum interrupt off timing result is written to this file. To read the current maximum, simply display the value:

# cat /proc/sys/kernel/preempt_max_latency
97


You will notice that the latency measurements for both wakeup latency and interrupt off latency are enabled and displayed using the same /proc file. This means, of course, that only one measurement can be configured at a time, or the results might not be valid. Because these measurements add significant runtime overhead, it isn't wise to enable them all at once anyway.

Interrupt Off History.
Enabling INTERRUPT_OFF_HIST provides functionality similar to that with ­WAKEUP_LATENCY_HIST. This option gathers interrupt off timing measurements into a file for later analysis. This data is formatted as a histogram, with bins ­ranging from 0 microseconds to just over 10,000 microseconds.

In the example just given, we saw that the maximum latency was 97 microseconds from that particular sample. Therefore, we can conclude that the latency data in histogram form will not contain any useful information beyond the 97-microsecond bin.

History data is obtained by reading a special /proc file. This output is redirected to a regular file for analysis or plotting as follows:

# cat /proc/latency_hist/interrupt_off_latency/CPU0 > hist_data.txt

Listing 17-6  below shows the first 10 lines of the history data.

Listing 17-6
Interrupt Off Latency History (Head)

$ cat /proc/latency_hist/interrupt_off_latency/CPU0 | head
#Minimum latency: 0 microseconds.
#Average latency: 1 microseconds.
#Maximum latency: 97 microseconds.
#Total samples: 60097595
#There are 0 samples greater or equal than 10240 microseconds
#usecs           samples
    0           13475417
    1           38914907
    2            2714349
    3             442308
...


From Listing 17-6 above we can see the minimum and maximum values, the average of all the values, and the total number of samples. In this case, we accumulated ­slightly more than 60 million samples. The histogram data follows the summary and ­contains up to around 10,000 bins. We can easily plot this data using gnuplot as shown in Figure 17-5 below.

Figure 17-5. Interrupt off latency data

Latency Tracing. The LATENCY_TRACE configuration option enables generation of kernel trace data associated with the last maximum latency measurement. It is also made available through the /proc file system.

A latency trace can help you isolate the longest-latency code path. For each new maximum latency measurement, an associated trace is generated that facilitates tracing the code path of the associated maximum latency.

Listing 17-7 below reproduces an example trace for a 78-microsecond maximum. As with the other measurement tools, enable the measurement by writing a 0 to /proc/sys/kernel/preempt_max_latency.

Listing 17-7
Interrupt Off Maximum Latency Trace

$ cat /proc/latency_trace
preemption latency trace v1.1.5 on 2.6.14-rt-intoff-tim_trace
-------------------------------------------------------------
 latency: 78 us, #50/50, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1)
    -----------------
    | task: softirq-timer/0-3 (uid:0 nice:0 policy:1 rt_prio:1)
    -----------------
                                                                
                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
     cat-6637  0D...    1us : common_interrupt ((0))
     cat-6637  0D.h.    2us : do_IRQ (c013d91c 0 0)
     cat-6637  0D.h1    3us+: mask_and_ack_8259A (__do_IRQ)
     cat-6637  0D.h1   10us : redirect_hardirq (__do_IRQ)
     cat-6637  0D.h.   12us : handle_IRQ_event (__do_IRQ)
     cat-6637  0D.h.   13us : timer_interrupt (handle_IRQ_event)
     cat-6637  0D.h.   15us : handle_tick_update (timer_interrupt)
     cat-6637  0D.h1   16us : do_timer (handle_tick_update)
     ...   <we're in the timer interrupt function>
     cat-6637  0D.h.   22us : run_local_timers (update_process_times)
     cat-6637  0D.h.   22us : raise_softirq (run_local_timers)
     cat-6637  0D.h.   23us : wakeup_softirqd (raise_softirq)
     ...   <softirq work pending " need to preempt is signaled>
     cat-6637  0Dnh.   34us : wake_up_process (wakeup_softirqd)
     cat-6637  0Dnh.   35us+: rcu_pending (update_process_times)
     cat-6637  0Dnh.   39us : scheduler_tick (update_process_times)
     cat-6637  0Dnh.   39us : sched_clock (scheduler_tick)
     cat-6637  0Dnh1   41us : task_timeslice (scheduler_tick)
     cat-6637  0Dnh.   42us+: preempt_schedule (scheduler_tick)
     cat-6637  0Dnh1   45us : note_interrupt (__do_IRQ)
     cat-6637  0Dnh1   45us : enable_8259A_irq (__do_IRQ)
     cat-6637  0Dnh1   47us : preempt_schedule (enable_8259A_irq)
     cat-6637  0Dnh.   48us : preempt_schedule (__do_IRQ)
     cat-6637  0Dnh.   48us : irq_exit (do_IRQ)
     cat-6637  0Dn..   49us : preempt_schedule_irq (need_resched)
     cat-6637  0Dn..   50us : __schedule (preempt_schedule_irq)
     ...   <here is the context switch to softirqd-timer thread>
   <...>-3     0D..2   74us+: __switch_to (__schedule)
   <...>-3     0D..2   76us : __schedule <cat-6637> (74 62)
   <...>-3     0D..2   77us : __schedule (schedule)
   <...>-3     0D..2   78us : trace_irqs_on (__schedule)
     ...   <output truncated here for brevity>


We have trimmed this listing significantly for clarity, but the key elements of this trace are obvious. This trace resulted from a timer interrupt. In the hardirq thread, little is done beyond queuing up some work for later in a softirq context.

This is seen by the wakeup_softirqd() function at 23 microseconds and is typical for interrupt processing. This triggers the need_resched flag, as shown in the trace by the n in the third column of the second field.

At 49 microseconds, after some processing in the timer softirq, the scheduler is invoked for preemption. At 74 microseconds, control is passed to the actual softirqd-timer/0 thread running in this particular kernel as PID 3. (The process name was truncated to fit the field width and is shown as <...>.)

Most of the fields of Listing 17-7 above have obvious meanings. The irqs-off field contains a D for sections of code where interrupts are off. Because this latency trace is an interrupts off trace, we see this indicated throughout the trace.

The need_resched field mirrors the state of the kernel's need_resched flag. An n indicates that the scheduler should be run at the soonest opportunity, and a period (.) means that this flag is not active. The hardirq/softirq field indicates a thread of execution in hardirq context with h, and softirq context with s.

The preempt-depth field indicates the value of the kernel's preempt_count variable, an indicator of nesting level of locks within the kernel. Preemption can occur only when this variable is at zero.

Debugging Deadlock Conditions. The DEBUG_DEADLOCKS kernel configuration option enables detection and reporting of deadlock conditions associated with the semaphores and spinlocks in the kernel. When enabled, potential deadlock conditions are reported in a fashion similar to this:

==========================================
   [ BUG: lock recursion deadlock detected! |
   ------------------------------------------
   ...


Much information is displayed after the banner line announcing the deadlock detection, including the lock descriptor, lock name (if available), lock file and name (if available), lock owner, who is currently holding the lock, and so on. Using this debug tool, it is possible to immediately determine the offending processes. Of course, fixing it might not be so easy!

Runtime Control of Locking Mode. The DEBUG_RT_LOCKING_MODE option enables a runtime control to switch the real-time mutex back into a nonpreemptable mode, effectively changing the behavior of the real-time (spinlocks as mutexes) kernel back to a spinlock-based kernel. As with the other configuration options we have covered here, this tool should be considered a development aid to be used only in a development environment.

It does not make sense to enable all of these debug modes at once. As you might imagine, most of these debug modes add size and significant processing overhead to the kernel. They are meant to be used as development aids and should be disabled for production code.

Chapter Summary
       Linux is increasingly being used in systems where real-time performance is required. Examples include multimedia applications and robot, industrial, and automotive controllers.
       Real-time systems are characterized by deadlines. When a missed deadline results in inconvenience or a diminished customer experience, we refer to this as soft real time. In contrast, hard real-time systems are considered failed when a deadline is missed.
       Kernel preemption was the first significant feature in the Linux kernel that addressed system-wide latency.
       Recent Linux kernels support several preemption modes, ranging from no preemption to full real-time preemption.
       The real-time patch adds several key features to the Linux kernel, resulting in reliable low latencies.
       The real-time patch includes several important measurement tools to aid in debugging and characterizing a real-time Linux implementation.

To read Part 1, go to "What is real-time Linux?"

Christopher Hallinan, field applications engineer at MontaVista Softwware, has worked for more than 20 years in assisgnments ranging from engineering and engineering management to marketing and busisness development. He spent four years as an independent development consultant in the embedded Linux marketplace.

This chapter is excerpted from the book titled "Embedded Linux Primer: A Practical Real-World Approach", by Christopher Hallinan. Published by Prentice Hall Professional, as part of the Prentice Hall Open Source Software Development Series ISBN 0131679848; Copyright 2007 Pearson Education, Inc.

References:
1)  Linux Kernel Development, 2nd Edition, by Robert Love, Novell Press, 2005
2) See www.rdrop.com/users/paulmck/RCU/ for an in-depth discussion of RCU.

1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS




 :