A
common misconception is that "hard real-time" means "very fast." This might explain why Microsoft claims that their upcoming v. 3.0 release of Windows CE will be a hard real-time OS. (CE v. 3.0 will reportedly allow nested interrupts and, hence, achieve sub-millisecond interrupt latencies.) However, hard real-time is better described as "when it absolutely,
positively has to be done on-time." Or, put another way, "a late result is a wrong result."
Another misconception is that "hard real-time" means "safety critical" and that because a system cannot kill someone there's no need to worry much about meeting deadlines. However, a hard real-time system is one where the consequences of missing a deadline are serious. To be sure, if injury or death can result from a missed deadline then this is serious and we have a hard real-time system. But lots of systems
exist where the consequences of missing a deadline even occasionally are economically unacceptable.
In high-volume industries it's important that systems operate correctly because the costs of even trivial failures are high. For example, in an automobile, the instrument panel computer might apply a timeout to a regular RPM message from the engine management system. If the message doesn't turn up on time, the instrument panel assumes a failure and lights a "check engine" lamp. It is important to do
this because if the engine management system isn't talking properly on the network then there might be some serious faults, and the driver should take the automobile to a workshop to check for a wiring fault. But if the vehicle systems haven't been designed to meet hard deadlines, and occasionally the RPM message turns up a little late, the timeout will trigger and the garage mechanics will find no actual fault. If this kind of overrun occurs even once in a million hours then, with the millions of hours of
operation of a particular vehicle type every day, there will be thousands of "no fault found" warranty claims each year. This could cost millions of dollars, not to mention the damage to a corporation's reputation for reliability. Clearly, the economic consequences of missing deadlines can be severe.
It's clear that if we want to build a hard real-time system then we have to be able to determine, before the system is deployed, whether deadlines could be missed or not. One common way to try to find
out the worst-case timing behavior of a system is via testing. But this is a poor technique, as
Figure 1
shows.
The graph shows the probability distribution of response times. The probability of a very short response time is zero for times below a certain point. This point corresponds to the best-case response time. Similarly, the probability again becomes zero above a certain response time: the worst-case response time. In testing, the longest observed response
time generally is less than the worst-case response time. This means that a deadline might be deemed "always met" but, in fact, could be missed after the system is deployed. A high-profile case of this is when the NASA JPL Pathfinder probe first reached Mars: although the control software passed its testing on Earth, a deadline was missed on Mars and the control system crashed. We'll look at this case in more detail later.
Static
scheduling techniques
We can see how important it is to meet hard real-time deadlines. Historically this has been done by static scheduling techniques.
Figure 2
shows a system scheduled off-line.
In
Figure 2
there are five things to do, each at its own rate. In the example system there is a requirement that each activity (or "task") is completed before it is due to run again. The system is scheduled by building a 4ms
schedule composed of four 1ms time slots. As
Figure 2
shows, the highest rate task (with a period of 1ms) appears in each slot. Lower rate tasks appear in just one or two of the slots. The schedule allows for time due to handling interrupts by calculating the largest number of interrupts from each source that could occur in any 1ms interval, and then allowing for this time in all the slots.
This kind of static cyclic scheduling provides superb real-time
predictability, and doesn't need any kind of real-time operating system (RTOS) to be implemented. The following code implements the schedule shown in
Figure 2
:
main()
{
do_init();
for(;;) {
busy_wait();
t1();
t2();
busy_wait();
t1();
t3();
t5();
busy_wait();
t1();
t2();
busy_wait();
t1();
t4();
}
}
This
pre-scheduled approach has some major problems. One problem is inefficiency: the system does not maximize the use of the processor. One reason is that urgent but infrequent events are handled very poorly. An event needs to be polled because it's never sure in which sub-schedule an event will occur, and so the schedule needs to poll frequently for each urgent event to be sure of responding on time. This means that a lot of processing time must be reserved for handling an event, even though in reality that time is
hardly ever used for hard real-time processing. The situation with interrupts is similar: in every slot we have to reserve time for handling interrupts even though those interrupts rarely occur.
The static approach is also inefficient because of the way task periods have to fit neatly together to keep the schedule cycle time short. In the schedule shown in Figure 2, we have periods of 1ms, 2ms, and 4ms. If we wanted to add a new task with a period of 5ms we would either have to increase the overall
schedule cycle time from 4ms to 20ms or else we would have to change the task period to 4ms. Since the schedule cycle time is the least common multiple of the periods of the tasks, the schedule soon grows out of control.
Task periods are usually shortened to neat values that fit a short schedule. Shortening a task's period means that the processor load due to the task
i
s increased. For example, if a task has an execution time of 500ýs and its period is shortened from 5ms to 4ms, then the
processor load due to the task
i
s raised from 10% to 12.5%, a waste of 2.5% of the processing resources. The overall wastage can be high, but this is often not noticed because the implementation technique is assumed right back when the system is first specified and the task periods are set as neat periods.
A major problem with the static approach is the difficulty of writing and maintaining application code.
Figure 2
shows how the spare time is allocated between
the time slots. The problem is that, although there might be quite a lot of spare time across the schedule, there's not enough spare time in any one place; a task has to fit within the spare time in a single time slot. When a task executes for longer than the spare time in any slot, there must be a restructuring of the task source code. The code must be broken down into two or more sub-tasks. These must be called from different time slots to use the available spare time. Unfortunately, it's usually not
trivial to split a piece of code so that the execution time of each piece is neatly balanced. In any case, additional code is usually needed to save the internal state at the end of one sub-task
i
nto global variables and re-load the state from these global variables in the subsequent sub-task. This wastes valuable RAM, ROM, and processing time.
Later in the development or maintenance phase a sub-task might itself need to be broken down into sub-tasks to re-allocate processing time between
time slots. As the development progresses, the logical structure of the application tends to become corrupted, and it becomes harder and harder to maintain the code. As well as increasing software life-cycle costs, this poor management of complexity can have a severe effect on the ability to develop quality software within time-to-market targets.
Priority-based scheduling
There is a better way of constructing a hard
real-time system: the use of multitasking under an OS scheduling tasks by priorities.
This kind of system has several tasks, each an independent thread of control and each assigned a priority. The OS ensures that at any given time the highest priority task that's ready is actually running. If a task becomes ready to run and this task
i
s of higher priority than the current task, the OS suspends the current task and starts running the new task. If a task
i
s running and suspends itself
(waiting for an event or for an interval of time, for example) then the task
i
s no longer ready to run and the OS resumes executing the next highest priority task.
Priority-based scheduling is potentially more efficient than static scheduling techniques for two reasons. First, selecting task periods based on harmonic values is unnecessary. So if the application requires 5ms, 12ms, and 7ms periods then the OS can run these tasks at those rates and avoid running tasks at higher rates. Of
course, this benefit is often overlooked because in many systems, the specification assumes that the implementation will use static scheduling and harmonic periods are specified. A second benefit is that each urgent event is handled by a high priority task that is made ready only when the event occurs. This gives a short response time to the event and uses the processor only when necessary.
Fortunately, priority-based scheduling is a popular implementation technique. In a survey by
Real-Time
Magazine
(issue 97-3), 41 of the more than 50 RTOSes listed supported fixed priority scheduling.
Analysis of priority-based systems
Although the system can be constructed and maintained more easily than a static approach, and although the processor can be used more efficiently, there is a drawback: the worst-case response times of each task and interrupt handler are not immediately obvious. If we don't know what the
worst-case behavior is, we cannot sensibly use priority-based scheduling for hard real-time systems. This is where so-called schedulability analysis comes in. Scheduling mathematics called deadline monotonic analysis (DMA) can analyze how tasks scheduled by priorities interact and determine the worst-case response time of each task and interrupt handler.
Before we can go through the basic analysis we need to make some assumptions about how the system behaves and define some basic notation.
Following are several assumptions that the analysis makes about how the application and OS behave:
- There is a fixed set of tasks with a fixed and known priority ordering
- Any task can become ready to run at any point in time, but the task cannot become ready again until some minimum time has elapsed
- Each time a task becomes ready to run it will run for only a bounded amount of processor time
- A task may not voluntarily suspend itself during its execution (so waiting for
an arbitrary event half-
- way through executing is not
- permitted)
- No task deadline can be longer than the task period. This means that the task must finish before it can be due to run again
- Each task must have a unique
- priority
- A given task
i
s never delayed by a lower priority task. As soon as a task becomes ready to run, the OS switches to running that task. During execution the task
i
s never delayed waiting for a lower priority task to
execute. This rules out disabling interrupts and sharing data via semaphores
- he OS performs scheduling and task switching in zero time
The last two assumptions are clearly unrealistic but we'll come back to these shortly and extend the analysis to remove them. The basic analysis can be extended to remove or modify most of the remaining assumptions, and some of these extensions are discussed later.
Figure 3
is an annotated time diagram defining
the notation used in the analysis.
The term T
i
is the so-called "period" of a task
i
. The period is the minimum time between a task being made ready to run and being made ready again. This model supports both sporadic and strictly periodic tasks.
Each time the task
i
is made ready it may execute for up to C
i
processor time. This is known as the worst-case execution time of task
i
. Note that this time does not include the
time for which other tasks and interrupt handlers use the processor; it is the processor time required only by task
i
.
Before we go on it is worth mentioning a useful technique for finding task execution times. This is to measure execution time for tests that exercise all feasible paths, with the OS providing a logical stopwatch for each task. When the task
i
s made ready, the stopwatch is zeroed. Whenever the task runs, the stopwatch counts. Whenever another task or interrupt
handler runs, the stopwatch is stopped. The "high-water mark" of the stopwatch after the tests gives the worst-case execution time.
The worst-case response time of a task
i
is measured from the time the task
i
s made ready to the time the task completes its worst-case execution time Ci. This worst-case response time is denoted R
i
. The deadline of a task
i
is denoted Di and a task will always meet its deadline if R
i
ý D
i
.
The basic idea of DMA is to find an equation that will calculate R
i
. R
i
is made up of two times: the time a task takes to execute its own code, and the time it takes for higher priority tasks to execute and finish with the processor. The following equation represents this relationship:
The term I
i
is the pre-emption time from higher
priority tasks and interrupt handlers. It's called the "interference." The problem now becomes finding the interference time.
Figure 4
shows a task being pre-empted by a higher priority task. It turns out that the maximum interference from a higher priority task
k
occurs when the lower priority task and task k are made ready at the same time.
Figure 4
shows how task
k
initially pre-empts the task then pre-empts again when it
is made ready for a second time. When task
k
comes back for a third try the lower priority task has already finished, and this time there is no interference.
The number of times that a given task k can pre-empt a task
i
while task
i
is ready is given by:
The symbol is the ceiling function, and is a round-up function. So, for example:
The total time taken by a higher priority task
k
when it pre-empts and executes is simply given by:
So for the total interference term, we simply add this up for all the higher priority tasks in the system and get the following:
The term hp
i
is the set of all tasks of higher priority than task
i
. The effects of interrupt handling can be included by ensuring that each interrupt source has a C and T value and including them in the set hp
i
.
Now that we have a term for the interference we can present the basic DMA response time equation:
(1)
Because R
i
appears on both the left- and right-hand sides of the equation it would appear that we can only find the worst-case response time if it is already known. Fortunately, this equation can be solved by forming a recurrence relation:
An initial value of zero for R
i
is sufficient and the sequence will converge to the
smallest value of Ri that satisfies Equation 1.
Before we eliminate some of the assumptions we made earlier, it's worth discussing how priorities are chosen in the first place. The term "deadline monotonic" refers to the priority allocation algorithm: assigning priorities monotonically with deadline. Thus the task with the shortest deadline (the smallest value of D) is assigned the highest priority. Ties are broken arbitrarily. There are proofs that this is the optimal priority assignment
algorithm under the assumption that T
<
D for all tasks.
|
Table 1 DMA example
|
|
Task
|
T
|
C
|
D
|
|
1
|
250ms
|
5ms
|
10ms
|
|
2
|
10ms
|
2ms
|
10ms
|
|
3
|
330ms
|
25ms
|
50ms
|
|
4
|
1000ms
|
29ms
|
1000ms
|
|
Table 2 Calculation of worst-case Task 3 response time
|
|
Step
|
R
n
|
I
|
R
n+1
|
|
1
|
0
|
0
|
25
|
|
2
|
25
|
5+3x2=11
|
36
|
|
3
|
36
|
5+4x2=13
|
38
|
|
4
|
38
|
5+4x2=13
|
38
|
Let's try an example. Table 1 describes a set of tasks. The table is in deadline monotonic priority order, with task
i
being the highest priority and task 4 the lowest priority. Notice how we don't care if the tasks are periodic or sporadic: we just need to know the minimum inter-arrival time.
Let's calculate
the worst-case response time of task 3. We start with an initial R estimate of 0. Table 2 shows the steps in the calculation. The equation converges at 38ms. So the worst-case response time of task 3 is 38ms. This is less than the deadline of 50ms and proves that task 3 will always meet its deadline in all situations.
Priority inversion, priority inheritance, and the priority ceiling protocol
One assumption we have
made so far is that a task will never be delayed by a lower priority one. This means that no lower priority task
i
s allowed to disable interrupts or even share a semaphore with a higher priority task. Clearly this assumption is unrealistic and we need to do something about it. The way we deal with this is to allow for so-called "blocking time." Denoted B
i
, blocking time is equal to the time for which the execution of lower priority tasks can delay a given task
i
. The equation
for the worst-case response time can be updated to take blocking time into account:
(2)
We need to work out this blocking time. Easier said than done, of course.
To see why this can be difficult it's worth taking the time to discuss a common problem known as priority inversion. Priority inversion occurs when tasks use regular semaphores to guard access to shared data, and cause
the response time of a task to become much longer than normal. This often leads to the system missing deadlines intermittently.
Figure 5
shows how the problem can occur.
The diagram is a timeline showing the execution trace of three tasks scheduled by priorities, with the highest priority task at the top and the lowest priority task at the bottom. Two tasks, "L" with a low priority and "H" with a high priority, share a data buffer and need to ensure exclusive access
to the data. This is done via a semaphore ("S1"). In
Figure 5
, L is running and wants to get access to the data. It locks S1 and starts using the data. While accessing the data, a medium priority task ("M") becomes ready. The scheduler suspends L and starts running M. Shortly afterwards, H is made ready, and the scheduler suspends M and starts running H. Then H wants to access the data and tries to lock S1. Unfortunately, S1 is already locked, so H is blocked awaiting the
semaphore. The next highest priority task
i
s M, so the scheduler resumes executing M, which runs for some time and eventually finishes, and the scheduler resumes executing L. Then L finishes with the shared data and unlocks S1. Task H is waiting for S1 and so is made ready to run. The scheduler resumes executing H, which then accesses the data, unlocks S1, and finishes.
You can see from
Figure 5
how the response time of H is quite longýlonger than M in any
caseýwhich isn't the right behavior because H should have a shorter response time. H is assigned a high priority because it's urgent and shouldn't end up being delayed for a longer time than M. The problem is called "priority inversion" because task M is delaying task H and, in effect, executing at a higher priority.
One of the problems with priority inversion is that it can be "hidden"-there can be implicit resource contention that's resolved using regular semaphore locks. For example, it is common
for an RTOS to provide a threaded C library where calls are protected against concurrent access. For example, the
malloc()
and
free()
calls typically lock an internal semaphore to guard access to their heap management data. When two tasks both make
malloc()
and
free()
calls, they are implicitly sharing data and making semaphore locks. A low priority task might be making a
malloc()
call when a high priority task also makes a
malloc()
call. The high priority task
i
s blocked and medium priority tasks get to run. Priority inversion again! Worse yet, the programmer might not even make
malloc()
and
free()
calls: they might be invoked automatically by a C++ compiler's code for constructors and destructors.
Priority inversion also causes a lot of trouble because it is intermittent: in testing, the scenario shown in
Figure 5
might not occur-response time of H will be short. Yet after deployment the tasks might
become phased as shown in
Figure 5
, increasing the response time of H, and causing the system to fail. This is exactly what happened to the "Pathfinder" probe to Mars in July 1997.
The spacecraft was controlled by a single processor on a VME bus that also contained interface cards for the radio, a camera, and an interface to an on-board MIL-STD-1553 bus. The bus connected two parts of the spacecraft together, with the VME system containing interfaces to, among other
things, the ASI/MET meteorological science instrument.
In the first few hours of operation on Mars, the spacecraft began experiencing resets. A reset re-initialized all hardware and software and terminated the ground command activities and the remaining activities were postponed until the next day.
The resets were due to priority inversion. The JPL engineers used a conventional RTOS to implement the bus control functions in the main processor. A task with a 125ms period, but a short
deadline-and so a high priority-was driving the on-board bus and handing data from the instruments to a low priority task. The two tasks communicated via a "pipe" construct. The internal OS implementation of the pipe used a semaphore to guard internal OS data.
On Earth extensive testing was done but the problem only manifested itself when the ASI/MET data was being collected at the same time as a high load due to medium priority tasks. Although one reset was seen in testing, the problem was not
repeatable and could not be isolated. Because the problem was rare the system was deemed sufficiently reliable (the space business is inherently risky and there's no point engineering a limited-life space vehicle to a much higher reliability than the launch technology).
On Mars the task execution patterns were different and priority inversion occurred frequently. Medium priority tasks stopped the low priority instrument task from running, which stopped the high priority bus task from running. When the bus
task missed its deadline the system assumed a fault and resets itself. However, since the fault was a design fault-using conventional semaphores in a real-time system-restarting the system just triggered the fault again.
Space systems are usually carefully designed to let Earth-based control gain access and upload new software, so the JPL engineers were able to patch over the problem by uploading new software that instructed the OS to use a different algorithm to lock the internal semaphores used
in the pipe functions. The new algorithm was priority inheritance.
Priority inheritance is a simple concept: a lower priority task L delaying a higher priority task H inherits the priority of H while it's causing a delay. Once L unlocks the semaphore, the priority of L falls back.
Figure 6
shows this.
Task L locks S1 as before and is pre-empted by M and then H. When H tries to lock the semaphore, the priority of L is raised to high and so L resumes
execution. When L releases the semaphore, its priority is restored to low. Now that the semaphore is free, task H resumes execution and finishes. Then M is resumed, and finally L completes. The high priority task has a shorter response time and in the case of the Pathfinder probe, this was sufficient to obtain correct operation. The rest of the Pathfinder mission was a success (the mission returned 2.6 billion bits of information, including more than 16,000 images).
Unfortunately, priority inheritance
doesn't solve the problem of unpredictable response times. We wanted to be able to work out Bi, but priority inheritance doesn't help. For example, it's possible for a system to suffer from "circular blocking," where task A is blocked by task B, which is blocked by task A. This is more familiarly known as "deadlock" and effectively gives infinite blocking times.
Figure 7
shows how two tasks can deadlock with priority inheritance.
In Figure 7, tasks L and H share two
data buffers, guarded by semaphores S1 and S2. Task L locks semaphore S1, but before it manages to lock S2, a switch to a higher priority task occurs. Task H locks S2, then tries to lock S1. Through the rules of priority inheritance, task L inherits a high priority and resumes execution. L tries to lock S2, but unfortunately S2 is already locked. Neither H nor L can run, so task M resumes execution. H and L never resume execution; they are deadlocked.
For the Mars Pathfinder, priority inheritance
was sufficient in its specific use of semaphores in the OS. But as a general algorithm, priority inheritance is not effective. In general it's not possible to show a system is free of deadlock and it's impossible to determine B
1
for each task
i
in the system.
Fortunately, there is a solution to this problem: the Priority Ceiling Protocol. This protocol provides a key property, in which a high priority task
i
s blocked once at most by all lower priority tasks. This
allows the blocking times to be calculated, and hence for DMA to be used to verify timing performance. Although the original algorithm as specified in 1987 is more complex than priority inheritance, there is a low-cost variant called Instant Inheritance that has identical worst-case behavior (and so the same blocking times apply) but can be implemented very efficiently.
The Priority Ceiling Protocol introduces the notion of "ceilings." Each semaphore has a ceiling priority: this is the priority of
the highest priority task that can lock the semaphore. A task can hold several semaphores at once, but only if they are locked in a nested pattern (for example, lock S1 .. lock S2 .. unlock S2 .. unlock S1). The Instant Inheritance algorithm is very simple: when a semaphore is locked, the locking task raises its priority to the ceiling priority of the semaphore. When the semaphore is unlocked the task's priority is restored.
Figure 8
shows this.
The scenario in
Figure 8
is the same as in Figure 3. Task L locks semaphore S1. Because both L and H share S1, the ceiling priority of S1 is high. When the lock call takes place the priority of L is raised to high. When M is made ready to run, no switch occurs because task L is running with a higher priority. Similarly, no switch occurs when H is made ready to run. Then when L unlocks S1 its priority is restored to low, and a switch to H occurs. After H has finished, task M executes. Finally,
task L is resumed and completes. You can see that the response time of H is short, which is just what's wanted.
Figure 8
shows why the Immediate Inheritance variant is so easy to implement. The "lock" call always succeeds (see how when L and H lock S1, neither is blocked). The reason the call always succeeds is that no other task can have the semaphore locked, because such a task would have the ceiling priority of the semaphore and would be running already, and the
current task would not be running. So the lock call always succeeds. This means that implementing semaphores merely requires a "change the current priority" operation. There is no need to queue the task
i
n a per-semaphore queue or even test to see if the semaphore is locked.
Most importantly, the blocking times of tasks can be worked out. A given task
i
is blocked at most once by a lower priority task. In fact, the blocking time of a task
i
is the longest time any lower priority
task k holds a semaphore with ceiling priority greater than or equal to the priority of task
i
. Because the blocking times are bounded there is no chance of deadlock.
Figure 9
shows this. The scenario is the same as in
Figure 7
but this time deadlock is avoided.
In
Figure 9
the ceiling priority of semaphores S1 and S2 is high because both L and H lock these semaphores. As soon as L locks S1, the
priority of L becomes high. This prevents both H and M from running. Locking S2 has no effect because L is already running at high priority. When L unlocks S1, its priority falls back to low, and task H gets to run, lock both semaphores, and finish.
We can more formally define the blocking time for a given task
i
by defining the following terms:
- Let lp
i
be the set of tasks with priorities lower than task
i
- Let locks
k
,
i
be
the set of semaphores locked by a task
k
, where the semaphores have a ceiling priority higher than the priority of task
i
- Let t
k,s
be the time for which a task
k
holds a semaphore s
- The blocking time B
i
is defined as follows:
(3)
Figure 10
shows how this works for
task M in the simple system shown earlier in
Figure 5
.
The observant reader might notice that the Immediate Inheritance variation of the Priority Ceiling Protocol is much like the approach of raising the processor interrupt priority level to hold out interrupt handlers of a certain priority so that data shared with those handlers isn't corrupted. In effect the protocol has re-invented, albeit in a more systematic way, what was first done in the 1960s. Because there
is a contiguous range of priorities, from software tasks to hardware interrupt priorities, the disabling of interrupts using the mechanism described can be modeled by DMA. Disabling interrupts to a particular level is just like locking a semaphore with a ceiling equal to that level.
OS overheads: scheduling and switching
The second major assumption we made earlier in DMA was that the OS performed scheduling and
task switching in zero time. Clearly this is not feasible, and we must do something about it.
We first look at bounding the costs of the scheduler. The scheduler is the part of the OS that decides when to apply delays and timeouts, so that tasks are ready when due. A common implementation is a so-called "tick scheduler." A periodic timer interrupt calls the OS scheduler that processes a time-ordered delay queue. The scheduler takes due tasks from the time queue and inserts them into a
priority-ordered "ready" queue. At the end of the tick the OS calls the dispatcher if a task switch is necessary. The scheduling approach requires that tasks handled by the tick scheduler have periods that are integer multiples of the tick period and at least as large. So if the tick period is 10ms then a task could have a period of 10ms, 20ms, 30ms and so on, but not 25ms. Similarly, none of these tasks can have a period shorter than 10ms.
The overheads of the scheduler would normally be modeled by including
the timer interrupt as a task (as we would for all other interrupt sources). The worst-case execution time of the interrupt handler is the interrupt overheads plus the longest execution time of the scheduler. This latter time may not be not bounded easily because typically, the queue processing depends on how many tasks are in the system: the more tasks, the longer the scheduler takes to run.
The following equation is a bound on the scheduler overheads in an interval of duration t:
where T
tick
is the tick period and C
tick
is the worst-case execution time of the tick scheduler.
Although this is sufficient, the overheads bound turns out to be quite pessimistic. This is because it assumes that for each tick all tasks are in the delay queue and that the whole queue is processed in the same tick. Although this can happen, it's rare. We can get more realistic
bounds by realizing that the tick scheduler processes any given task
i
at most once every T
i
. In effect, there is a pseudo-task for each task
i
, with a worst-case execution time equal to the per-task cost of the tick scheduler and a period equal to T
i
.
The following equation is a more accurate bound on the scheduler overheads in an interval of duration t:
where C
base
is the execution time of the tick scheduler when the tick processes zero tasks, C
task
i
s the additional per-task cost of the tick, and ticktasks is the set of all tasks that are scheduled by the tick scheduler. Note that C
task
i
s often not a simple constant but rather a function of how many tasks are in the system.
From this we can derive an equation that bounds the time a given task
i
is delayed by the
execution of the scheduler:
(4)
We can now update Equation 2 to include the scheduling costs:
(5)
Now we can look at the task switching costs. The OS dispatcher performs task switching and is called whenever a task switch is necessary. For example, the scheduler may call
the dispatcher if it makes a new task ready that is of higher priority than the current task. OS calls themselves may also call the dispatcher. For example, a semaphore "unlock" call may necessitate a task switch, triggering a call to the dispatcher. Whenever the dispatcher is called, it does a simple job: it saves the context of the current task (that is, processor registers), restores the context for the new task, and resumes the execution of this new task. Later, when the new task finishes, the
dispatcher is again called and saves the context, places the task in a non-ready state, chooses which task to run next, and then restores the context of the next task.
The "switch-in" cost to start a new task usually has a simple bound, but the "switch-out" cost is often more complex, because placing the task in a non-ready state can entail searching a time-ordered queue of timeouts to insert the task in the right place. So the "switch-out" cost is usually dependent on the number of tasks in the system.
The switching costs of the OS can be modeled by adding the "switch-in" and "switch-out" costs to the worst-case execution time of each task. Figure 11 shows this. Note that Cout is often not a simple constant and is typically a function of how many tasks are in the system.
The total switching costs for a single switch are, of course:
(6)
Using this definition, we can
update Equation 5:
(7)
Where:
- C
i
is the worst-case execution time of a given task
i
- T
k
is the minimum time between a given task
k
being made ready and being made ready again
- C
sw
is costs of switching to and back from a pre-empting task and is defined by Equation 6
- B
i
is the blocking time of a given task
i
and defined by Equation 3
- S
i
is the scheduler overheads for a given task i and is defined by Equation 4
- hp
i
is the set of tasks of higher priority than a given task
i
Discussion
With Equation 7 and the right OS and application behavior, we calculate the worst-case response time for every
task in a system. This is useful because it means we can verify the timing behavior of complex real-time systems before deployment. But the analysis can be applied to more than just processor scheduling.
Controller area network (CAN) is an embedded networking bus that arbitrates between messages on the bus by using priorities. Each message is tagged with a unique priority which serves to identify the message. Because the bus is scheduled by priorities the basic DMA is a natural fit and has been
developed to calculate worst-case latencies for messages sent on a CAN bus. The Volvo Car Corp. used this analysis as the foundation of the networking system for the Volvo S80 sedan. The analysis was used as part of an optimizing toolchain to configure the network so that the lowest bus speeds could be used (minimizing wiring costs), yet with all timing requirements met. This vehicle has two CAN buses, and the complete timing behavior of the network was verified using the analysis before the vehicle went into
series production.
DMA has also been applied to the problem of scheduling a hard disk drive to fetch and store data for concurrent multimedia streams. Each data stream requires a bounded number of disk blocks every so often. This is analogous to each task requiring a bounded computation time every period. The time taken to move the disk head between the disk areas for different data streams is analogous to task switch times. DMA for disk drive scheduling allows a playback video server to apply the
analysis each time an "open" request is made and to accept the request only when the analysis can guarantee that the data will be fetched in time.
There have been a number of extensions to the basic analysis that remove some of the assumptions listed earlier or else improve the flexibility of the analysis. Examples of these extensions are:
- D > T. Allowing the deadline (and hence response time) to be longer than the period turns out to be particularly useful for analyzing wide-area
networks carrying multimedia data. For example, a source for a 10Hz video stream may send frame data every 100ms, yet the worst-case latency through the network for a single video frame might be two seconds. The analysis has been extended to take into account how previous invocations of a task can impact later ones
- Shared priorities. It's quite common for an implementation to limit the total number of priority levels for efficiency reasons. The analysis has been extended to take account of the behavior of
the system when two tasks can have the same priority
- Task phasing. The basic analysis assumes that there is no fixed phase relationship between tasks. But if one task is always made ready some fixed time after another then the basic analysis becomes pessimistic. The analysis has been extended to take account of phase relationships between given sets of tasks
Of course, underlying all this analysis is the assumption that the system is predictable. As we have seen already, the
OS must provide the right kind of semaphore locking to the user (the Priority Ceiling Protocol) and also provide a characterization of the scheduling and switching overheads of the OS. The OS must also be carefully designed to avoid internal priority inversion.
Ken Tindell is a professor of embedded systems at Jonkoping University in Sweden and the chief technology officer of Realogy, a company that provides an RTOS and analysis tools based on extended DMA. He received his PhD in real-time
systems from the University of York in England. Since 1995, he has worked in industry, applying real-time techniques to communication and computing. He can be reached at
ktindell@realogy.com
.
Further reading
You can find out more about hard real-time systems and tools that support DMA at
www.realogy.com
.