The lowdown on Embedded Linux and its use in real-time applications: Part 2 -

The lowdown on Embedded Linux and its use in real-time applications: Part 2

Real-Support for hard real time is not in themainline source tree. To enable hard real time, a patch mustbe applied. The real-time kernel patch is the cumulative result ofseveral initiatives to reduce Linux kernel latency.

The patch had many contributors, and it is currentlymaintained by Ingo Molnar; you can find it on the Red Hat site. The soft real-time performance ofthe 2.6 Linux kernel has improved significantly since the early 2.6kernel releases. When 2.6 was first released, the 2.4 Linux kernel wassubstantially better in soft real-time performance.

Since about Linux 2.6.12, soft real-time performancein the single-digit milliseconds on a reasonably fast x86 processor isreadily achieved. To get repeatable performance beyond this requiresthe real-time patch.

The real-time patch adds several important featuresto the Linux kernel. Figure 17-4 below shows the configuration options for Preemption mode when thereal-time patch has been applied.

Figure17-4. Preemption modes with real time patch

The real-time patch adds a fourth preemption modecalled PREEMPT_RT, or Preempt Real Time. The four preemption modes areas follows:

      PREEMPT_NONE: No forced preemption. Overall latency is, onaverage, good, but there can be some occasional long delays. Bestsuited for applications for which overall throughput is the top designcriteria.

       PREEMPT_VOLUNTARY: First stage of latency reduction. Additional explicit preemption pointsare placed at strategic locations in the kernel to reduce latency. Someloss of overall throughput is traded for lower latency.

       PREEMPT_DESKTOP: This mode enables preemption everywhere in the ­kernel except whenprocessing within critical sections. This mode is useful for softreal-time applications such as audio and multimedia. Overall throughputis traded for further reductions in latency.

       PREEMPT_RT: Features from the real-time patch are added, including replacingspinlocks with preemptable mutexes. This enables involuntary preemptioneverywhere within the kernel except for those areas protected bypreempt_disable().

This mode significantly smoothes out the variation inlatency (jitter) and allows a low and predictable latency fortime-critical real-time applications.

If kernel preemption is enabled in your kernelconfiguration, it can be disabled at boot time by adding the followingkernel parameter to the kernel command line:

Real-Time Features
Several new Linux kernel features are enabled with CONFIG_PREEMPT_RT.From Figure 17-4 above, we seeseveral new configuration settings. These and other features of thereal-time Linux kernel patch are described here.

Spinlock Converted to Mutex. Thereal-time patch converts most spinlocks in the system to mutexes. Thisreduces overall latency at the cost of slightly reduced throughput. Thebenefit of converting spinlocks to mutexes is that they can bepreempted. If Process A is holding a lock, and Process B at a higherpriority needs the same lock, Process A can preempt Process B in thecase where it is holding a mutex.

ISRs as Kernel Tasks. WithCONFIG_PREEMPT_HARDIRQ selected, interruptservice routines (ISRs) are forced to run in processcontext. This gives the developer control over the priority of ISRsbecause they become schedulable entities. As such, they also becomepreemptable to allow higher-priority hardware interrupts to be handledfirst.

This is a powerful feature. Some hardware architectures do not enforceinterrupt priorities. Those that do might not enforce the prioritiesconsistent with your specified real-time design goals. Using CONFIG_PREEMPT_HARDIRQ , you are free to definethe priorities at which each IRQ will run.

Conversion of ISRs to threads can be disabled atruntime through the /proc file system or at boot time by entering aparameter on the kernel command line. When enabled in theconfiguration, unless you specify otherwise, ISR threading is enabledby default.

To disable ISR threading at runtime, issue the following command asroot:

# echo '0'>/proc/sys/kernel/hardirq_preemption

To verify the setting, display it as follows:
# cat /proc/sys/kernel/hardirq_preemption

To disable ISR threading at boot time, add the following parameter tothe kernel command line:

Preemptable Softirqs. CONFIG_PREEMPT_SOFTIRQ reduces latency byrunning softirqs within the context of the kernel's softirq daemon(ksoftirqd). ksoftirqd is a proper Linux task (process). As such, itcan be prioritized and scheduled along with other tasks. If your kernelis configured for real time, and CONFIG_PREEMPT_SOFTIRQ is enabled, theksoftirqd kernel task is elevated to real-time priority to handle thesoftirq processing. Listing 17-3 below shows the code responsible for this from a recent Linux kernel, foundin …/kernel/softirq.c.

Listing 17-3
Promoting ksoftirq to Real-Time Status

static int ksoftirqd(void * __bind_cpu)
      struct sched_param param = {.sched_priority = 24 };

      printk(“ksoftirqdstarted up.n”);

      printk(“softirq RT prio: %d.n”,param.sched_priority);
      sys_sched_setscheduler(current->pid,SCHED_FIFO, &param);
      set_user_nice(current, -10);

Here we see that if CONFIG_PREEMPT_SOFTIRQS is enabled in the kernel configuration, the ksoftirqd kernel task ispromoted to a real-time task (SCHED_FIFO )at a real-time priority of 24 using the sys_sched_setscheduler() kernel function.

SoftIRQ threading can be disabled at runtime through the /proc file system, as well as through the kernelcommand line at boot time. When enabled in the configuration, unlessyou specify otherwise, SoftIRQ threading is enabled by default. Todisable SoftIRQ threading at runtime, issue the following command asroot:

# echo '0' >/proc/sys/kernel/softirq_preemption
To verify the setting, display it as follows:
# cat /proc/sys/kernel/softirq_preemption

To disable SoftIRQ threading at boot time, add the following parameterto the kernel command line:


Preempt RCU. RCU(Read-Copy-Update) is a special form of synchronization primitive inthe Linux kernel designed for data that is read frequently but updatedinfrequently. You can think of RCU as an optimized reader lock. Thereal-time patch adds ­CONFIG_PREEMPT_RCU ,which improves latency by making certain RCU sections preemptable.

O(1) Scheduler. The O(1)scheduler has been around since the days of Linux 2.5. It is mentionedhere because it is a critical component of a real-time solution. TheO(1) scheduler is a significant improvement over the previous Linuxscheduler. It scales better for systems with many processes and helpsproduce lower overall latency.

In case you are wondering, O(1) is a mathematical designation for asystem of the first order. In this context, it means that the time ittakes to make a scheduling decision is not dependent on the number ofprocesses on a given runqueue. The old Linux scheduler did not havethis characteristic, and its performance degraded with the number ofprocesses.

Creating a Real-Time Process
You can designate a process as real time by setting a process attributethat the scheduler uses as part of its scheduling algorithm. Listing 17-4  below shows thegeneral method.

Listing 17-4
Creating a Real-Time Process


#define MY_RT_PRIORITY MAX_USER_RT_PRIO /* Highestpossible */

int main(int argc, char **argv)
      int rc, old_scheduler_policy;
      struct sched_param my_params;

      /* Passing zero specifies caller's (our)policy */
      old_scheduler_policy =sched_getscheduler(0);

     my_params.sched_priority = MY_RT_PRIORITY;
      /* Passing zero specifies callers (our)pid */
      rc = sched_setscheduler(0, SCHED_RR,&my_params);
      if ( rc == -1 )

This code snippet does two things in the call tosched_setscheduler() . It changes the scheduling policy to SCHED_RR and raises its priority to the maximumpossible on the system. Linux supports three scheduling policies:

       SCHED_OTHER: Normal Linux process, fairnessscheduling
       SCHED_RR: Real-time process with a time slice—that is, if it does not block, itis allowed to run for a given period of time determined by the scheduler
       SCHED_FIFO: Real-time process that runs until it either blocks or ­explicitlyyields the processor, or until another higher-priority SCHED_FIFO process becomes runnable

The main page for sched_setscheduler provides more detail on the three different scheduling policies.

Critical Section Management
When writing kernel code, such as a custom device driver, you willencounter data structures that you must protect from concurrent access.The easiest way to protect critical data is to disable preemptionaround the critical section. Keep the critical path as short aspossible to maintain a low maximum latency for your system. Listing 17-5  below shows anexample.

Listing 17-5
Protecting Critical Section in Kernel Code

 * Declare and initialize a global lock for your
 * critical data

int operate_on_critical_data()
    /* Update critical/shared data */

When a task successfully acquires a spinlock, preemption is disabledand the task that acquired the spinlock is allowed into the criticalsection. No task switches can occur until a spin_unlock operation takesplace. The spin_lock() function isactually a macro that has several forms, depending on the kernelconfiguration. They are defined at the top level(architecture-independent definitions) in…/include/linux/spinlock.h.

When the kernel is patched with the real-time patch,these spinlocks are promoted to mutexes to allow preemption ofhigher-­priority processes when a spinlock is held.

Because the real-time patch is largely transparent to the device driverand kernel developer, the familiar constructs can be used to protectcritical sections, as described in Listing17-5 above. This is a major advantage of the real-time patch forreal-time applications; it preserves the well-known semantics forlocking and interrupt service routines.

Using the macro DEFINE_SPINLOCK as in Listing 17-5 above  preservesfuture compatibility. These macros are defined in …/include/linux/spinlock_types.h.

Debugging the Real-Time Kernel
Several configuration options facilitate debugging and performanceanalysis of the real-time patched kernel. They are detailed in thefollowing subsections.

Soft Lockup Detection. Toenable soft lockup detection, enable CONFIG_DETECT_SOFTLOCKUP in the kernel configuration. This feature enables the detection of longperiods of running in kernel mode without a context switch. Thisfeature exists in non-real-time kernels but is useful for detectingvery high latency paths or soft deadlock conditions. To use it, simplyenable the feature and watch for any reports on the console or systemlog. Reports will be emitted similar to this:

BUG: soft lockup detected on CPU0

When this message is emitted by the kernel, it is usually accompaniedby a backtrace and other information such as the process name and PID.It will look similar to a kernel oops message complete with processorregisters. See …/kernel/softlockup.c fordetails. This information can be used to help track down the source ofthe lockup condition.

Preemption Debugging. Toenable preemption debugging, enable CONFIG_DEBUG_PREEMPT in the kernel configuration. This debug feature enables the detectionof unsafe use of preemption semantics such as preemption countunderflows and attempts to sleep while in an invalid context. To useit, simply enable the feature and watch for any reports on the consoleor system log. Here is just a small sample of reports possible whenpreemption debugging is enabled:

BUG: , possible wake_uprace on BUG: lock recursion deadlock detected! 
BUG: nonzero lock count at exit time?

Many more messages are possible—these are just a few examples of thekinds of problems that can be detected. These messages will help youavoid deadlocks and other erroneous or dangerous programming semanticswhen using real-time kernel features. For more details on the messagesand conditions under which they are emitted, browse the Linux kernelsource file …/kernel/rt-debug.c .

Debug Wakeup Timing. Toenable wakeup timing, enable CONFIG_WAKEUP_TIMING in the kernel configuration. This debug option enables measurement ofthe time taken from waking up a high-priority process to when it isscheduled on a CPU. Using it is simple. When configured, measurement isdisabled. To enable the measurement, do the following as root:

# echo '0'>/proc/sys/kernel/preempt_max_latency
When this /proc file is set to zero, each successive maximum wakeuptiming result is written to this file. To read the current maximum,simply display the value:
# cat /proc/sys/kernel/preempt_max_latency

As long as any of the latency-measurement modes are enabled in thekernel configuration, preempt_max_latency will always be updated with the maximum latency value. It cannot bedisabled. Writing 0 to this /proc variable simply resets the maximum tozero to restart the cumulative measurement.

Wakeup Latency History. Toenable wakeup latency history, enable CONFIG_WAKEUP_LATENCY_HIST while CONFIG_WAKEUP_TIMING is alsoenabled. This option dumps all the wakeup timing measurements enabledby CONFIG_WAKEUP_TIMING into a file forlater analysis. An example of this file and its contents is presentedshortly when we examine interrupt off history.
       CRITICAL_PREEMPT_TIMING: Measures the time spent in critical sections with preempt disabled.
       PREEMPT_OFF_HIST: Similar to WAKEUP_LATENCY_HIST . Gatherspreempt off timing measurements into a bin for later analysis.

Interrupt Off Timing. Toenable measurement of maximum interrupt off timing, configure yourkernel with CRITICAL_IRQSOFF_TIMING enabled. This option measures time spent in critical sections with irqsdisabled. This feature works in the same way as wakeup latency timing.To enable the measurement, do the following as root:

# echo '0'>/proc/sys/kernel/preempt_max_latency

When this /proc file is set to zero, eachsuccessive maximum interrupt off timing result is written to this file.To read the current maximum, simply display the value:

# cat /proc/sys/kernel/preempt_max_latency

You will notice that the latency measurements for both wakeup latencyand interrupt off latency are enabled and displayed using the same /proc file. This means, of course, that only onemeasurement can be configured at a time, or the results might not bevalid. Because these measurements add significant runtime overhead, itisn't wise to enable them all at once anyway.

Interrupt Off History.
Enabling INTERRUPT_OFF_HIST provides functionality similar to that with ­WAKEUP_LATENCY_HIST .This option gathers interrupt off timing measurements into a file forlater analysis. This data is formatted as a histogram, with bins­ranging from 0 microseconds to just over 10,000 microseconds.

In the example just given, we saw that the maximumlatency was 97 microseconds from that particular sample. Therefore, wecan conclude that the latency data in histogram form will not containany useful information beyond the 97-microsecond bin.

History data is obtained by reading a special /proc file. This output is redirected to a regular file for analysis orplotting as follows:

# cat/proc/latency_hist/interrupt_off_latency/CPU0 > hist_data.txt

Listing 17-6  below showsthe first 10 lines of the history data.

Listing 17-6
Interrupt Off Latency History (Head)

$ cat/proc/latency_hist/interrupt_off_latency/CPU0 | head
#Minimum latency: 0 microseconds.
#Average latency: 1 microseconds.
#Maximum latency: 97 microseconds.
#Total samples: 60097595
#There are 0 samples greater or equal than 10240 microseconds
#usecs          samples
   0           13475417
   1           38914907
   2           2714349
   3            442308

From Listing 17-6 above we cansee the minimum and maximum values, the average of all the values, andthe total number of samples. In this case, we accumulated ­slightlymore than 60 million samples. The histogram data follows the summaryand ­contains up to around 10,000 bins. We can easily plot thisdata using gnuplot as shown in Figure17-5 below.

Figure17-5. Interrupt off latency data

Latency Tracing. TheLATENCY_TRACE configuration option enablesgeneration of kernel trace data associated with the last maximumlatency measurement. It is also made available through the /proc file system.

A latency trace can help you isolate thelongest-latency code path. For each new maximum latency measurement, anassociated trace is generated that facilitates tracing the code path ofthe associated maximum latency.

Listing 17-7 below reproducesan example trace for a 78-microsecond maximum. As with the othermeasurement tools, enable the measurement by writing a 0 to /proc/sys/kernel/preempt_max_latency .

Listing 17-7
Interrupt Off Maximum Latency Trace

$ cat /proc/latency_trace
preemption latency trace v1.1.5 on 2.6.14-rt-intoff-tim_trace
 latency: 78 us, #50/50, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1)
    | task: softirq-timer/0-3 (uid:0 nice:0 policy:1rt_prio:1)
                _——=> CPU#
               / _—–=> irqs-off
              | / _—-=> need-resched
              || / _—=> hardirq/softirq
              ||| / _–=> preempt-depth
              |||| /
              |||||     delay
   cmd     pid ||||| time |   caller
         /   |||||      |   /
     cat-6637  0D…    1us :common_interrupt ((0))
     cat-6637  0D.h.    2us :do_IRQ (c013d91c 0 0)
     cat-6637  0D.h1    3us+:mask_and_ack_8259A (__do_IRQ)
     cat-6637  0D.h1   10us :redirect_hardirq (__do_IRQ)
     cat-6637  0D.h.   12us :handle_IRQ_event (__do_IRQ)
     cat-6637  0D.h.   13us :timer_interrupt (handle_IRQ_event)
     cat-6637  0D.h.   15us :handle_tick_update (timer_interrupt)
     cat-6637  0D.h1   16us :do_timer (handle_tick_update)
     cat-6637  0D.h.   22us :run_local_timers (update_process_times)
     cat-6637  0D.h.   22us :raise_softirq (run_local_timers)
     cat-6637  0D.h.   23us :wakeup_softirqd (raise_softirq)
     cat-6637  0Dnh.   34us :wake_up_process (wakeup_softirqd)
     cat-6637  0Dnh.   35us+:rcu_pending (update_process_times)
     cat-6637  0Dnh.   39us :scheduler_tick (update_process_times)
     cat-6637  0Dnh.   39us :sched_clock (scheduler_tick)
     cat-6637  0Dnh1   41us :task_timeslice (scheduler_tick)
     cat-6637  0Dnh.   42us+:preempt_schedule (scheduler_tick)
     cat-6637  0Dnh1   45us :note_interrupt (__do_IRQ)
     cat-6637  0Dnh1   45us :enable_8259A_irq (__do_IRQ)
     cat-6637  0Dnh1   47us :preempt_schedule (enable_8259A_irq)
     cat-6637  0Dnh.   48us :preempt_schedule (__do_IRQ)
     cat-6637  0Dnh.   48us :irq_exit (do_IRQ)
     cat-6637  0Dn..   49us :preempt_schedule_irq (need_resched)
     cat-6637  0Dn..   50us :__schedule (preempt_schedule_irq)
   <...>-3     0D..2  74us+: __switch_to (__schedule)
   <...>-3     0D..2  76us : __schedule (74 62)
   <...>-3     0D..2  77us : __schedule (schedule)
   <...>-3     0D..2  78us : trace_irqs_on (__schedule)

We have trimmed this listing significantly for clarity, but the keyelements of this trace are obvious. This trace resulted from a timerinterrupt. In the hardirq thread, little is done beyond queuing up somework for later in a softirq context.

This is seen by the wakeup_softirqd() function at 23 microseconds and is typical for interrupt processing.This triggers the need_resched flag, as shown in the trace by the n inthe third column of the second field.

At 49 microseconds, after some processing in thetimer softirq, the scheduler is invoked for preemption. At 74microseconds, control is passed to the actual softirqd-timer/0 threadrunning in this particular kernel as PID 3. (The process name wastruncated to fit the field width and is shown as <...> .)

Most of the fields of Listing 17-7above have obvious meanings. The irqs-off field contains a D for sections of codewhere interrupts are off. Because this latency trace is an interruptsoff trace, we see this indicated throughout the trace.

The need_resched fieldmirrors the state of the kernel's need_resched flag. An n indicatesthat the scheduler should be run at the soonest opportunity, and aperiod (.) means that this flag is not active. The hardirq/softirq field indicates a thread of execution in hardirq context with h, andsoftirq context with s.

The preempt-depth fieldindicates the value of the kernel's preempt_count variable, an indicator of nesting level of locks within the kernel.Preemption can occur only when this variable is at zero.

Debugging Deadlock Conditions. The DEBUG_DEADLOCKS kernelconfiguration option enables detection and reporting of deadlockconditions associated with the semaphores and spinlocks in the kernel.When enabled, potential deadlock conditions are reported in a fashionsimilar to this:

   [ BUG: lock recursion deadlock detected! |

Much information is displayed after the banner line announcing thedeadlock detection, including the lock descriptor, lock name (ifavailable), lock file and name (if available), lock owner, who iscurrently holding the lock, and so on. Using this debug tool, it ispossible to immediately determine the offending processes. Of course,fixing it might not be so easy!

Runtime Control of LockingMode. The DEBUG_RT_LOCKING_MODE option enables a runtime control to switch the real-time mutex backinto a nonpreemptable mode, effectively changing the behavior of thereal-time (spinlocks as mutexes) kernel back to a spinloc

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.