Real-time systems design and RMA (rate monotonic algorithms) go together like peanut butter and jelly. So why is it that wherever I go in the embedded community, engineers are developing real-time systems without applying RMA? This is a dangerous situation, but one that is easily remedied by ensuring every programmer knows three things about RMA .
In case you are entirely unfamiliar with RMA, here’s a link to a handy primer on the technique. I’ve tried to write this column in a way that you can read those details before or after, at your option. I’ve also included a Glossary of Real Time Terminology at the end of this article.
#1: RMA is Not Just for Academics
You have probably heard of RMA. Maybe you can even expand the acronym (consult the glossary at the end of this column if you can’t ). Maybe you also know that the theoretical underpinnings of RMA were developed largely at Carnegie Mellon University’s Software Engineering Institute and/or that the technique has been known for about three decades.
If, however, you are like the vast majority of the thousands of firmware engineers I have communicated with on the subject during my years as a writer/editor, consultant, and trainer, you probably think RMA is just for academics. I also thought that way years ago—but here’s the straight dope:
( Schedulers that tweak task priorities dynamically, as desktop flavors of Windows and Linux do, may miss deadlines indiscriminately during transient overload periods. They should thus not be used in the design of safety-critical real-time systems.
2) RMA is the optimal method of assigning fixed priorities to RTOS tasks. That is to say that if a set of tasks cannot be scheduled using RMA, it can’t be scheduled using any fixed-priority algorithm.
3) RMA provides convenient rules of thumb regarding the percentage of CPU you can safely use while still meeting all deadlines. If you don’t use RMA to assign priorities to your tasks, there is no rule of thumb that will ensure all of their deadlines will be met.
( For example, it is widely rumored that a system less than 50% loaded will always meet its deadlines. Unfortunately, there is no such rule of thumb that’s correct. By contrast, when you do use RMA there is a simple rule of thumb ranging from a high of 82.8% for 2 tasks to a low of 69.2% for N tasks.
A key feature of RMA is the ability to prove a priori that a given set of tasks will always meet its deadlines—even during periods of transient overload. Dynamic-priority operating systems cannot make this guarantee. Nor can fixed-priority RTOSes running tasks prioritized in other ways.
Too many of today's real-time systems built with an RTOS are working by luck. Excess processing power may be masking design and analysis sins or the worst-case just hasn’t happened—yet.
Bottom line : You’re playing with fire if you don’t use RMA to assign priorities to critical tasks; it might be just a matter of time before your product’s users get burned.
( Perhaps your failure to use RMA to prioritize tasks and prove they’ll meet deadlines explains one or more of those “glitches” your customers have been complaining about?
#2: RMA Need Not Be Applied to Every Task
As any programmer that’s already put RMA into practice will tell you, the hardest part of the analysis phase is establishing an upper bound for the worst-case execution time of each task. The CPU utilization of each task is computed as the ratio of its worst-case execution time to its worst-case period. ( Establishing the worst-case period of a task is both easier and more stable. )
There are three ways to place an upper bound on execution time: (1) by measuring the actual execution time during the tested worst-case scenario; (2) by performing a top-down analysis of the code in combination with a cycle-counter; or (3) by making an educated guess based on big-O notation.
I call these alternatives measuring, analyzing, and budgeting, respectively, and note that the decision of which to use involves tradeoffs of precision vs. level of effort.
Measurement can be extremely precise, but requires the ability to instrument and test the actual working code—which must be remeasured after every code change. Budgeting is easiest and can be performed even at the beginning of the project, but it is necessarily imprecise (in the conservative direction of requiring more CPU bandwidth than is actually required).
But there is at least some good news about the analysis. RMA need not be performed across the entire set of tasks in the system. It is possible to define a smaller (often much smaller, in my experience) critical set of tasks on which RMA needs to be performed, with the remaining non-critical tasks simply assigned lower priorities.
This critical set of tasks should contain all of the tasks with deadlines that can’t be missed or else. In addition, it should contain any other tasks the former set either share mutexes with or from which they require timely semaphore or message queue posts. Every other task is considered non-critical.
RMA can be meaningfully applied to the critical set tasks only, so long as we ensure that all of the non-critical tasks have priorities below the entire critical set. We then need only determine worst-case periods and worst-case execution times for the critical set. Furthermore, we need only follow the rate monotonic algorithm for assignment of priorities within the critical set.
Bottom line: Anything goes at lower priorities where there are no deadlines.
#3: RMA Applies to Interrupt Service Routines Too
With few exceptions, books, articles, and papers that mention RMA describe it as a technique for prioritizing the tasks on a preemptive fixed-priority operating system. But the technique is also essential for correctly prioritizing interrupt handlers.
Indeed, even if you have designed a real-time system that consists only of interrupt service routines (plus a do-nothing background loop in main ), you should use the rate monotonic algorithm to prioritize them with respect to their worst-case frequency of occurrence. Then you can use rate monotonic analysis to prove that they will all meet their real-time deadlines even during transient overload.
Figure 1. Where RMA Fits in the Real-Time Hierarchy
Furthermore, if you have a set of critical tasks in addition to interrupt service routines as shown in Figure 1 above , the prioritization and analysis associated with RMA need to be performed across the entire set of those entities.
( Note this is necessary even if one or more of the interrupts doesn’t have a real-time deadline of its own. That’s because the interrupts may occur during the transient overload and thus prevent one or more critical set tasks from meeting its real-time deadline. )
This can be complicated, as there is an arbitrary “priority boundary” imposed by the CPU hardware: even the lowest priority ISR is deemed more important than the highest priority task.
For example, consider the conflict in the set of ISRs and tasks in Table 1 below. RMA dictates that the priority of Task A should be higher than the priority of the ISR, because Task A can occur more frequently.
But the hardware demands otherwise, by limiting our ability to move ISRs down in priority. If we leave things as they are, we cannot simply sum the CPU utilization of this set of entities to see if they are below the schedulable bound for four entities.
Table 1. A Misprioritized Interrupt Handler
So what should we do in a conflicted scenario like this? There are two options. Either we change the program’s structure, by moving the ISR code into a polling loop that operates as a 10 ms task at priority 2—in which case total utilization is 51%.
Or we treat the ISR, for purposes of proof via rate monotonic analysis anyway, as though it actually has a worst-case period of 3 ms. In the latter option, the ISR has an appropriate top priority by RMA but the CPU bandwidth dedicated to the ISR increases from 5% to 16.7%–bringing the new total up to 62.7%
Either way, the full set is provably schedulable. ( However, switching the code to use polling actually consumes cycles that are only reserved for the worst-case in the other solution. That could mean failing to find CPU time for low priority non-critical tasks in the average case.
Bottom line: Interrupt handlers must be considered part of the critical set, with RMA used to prioritize them in relation to the tasks they might steal the CPU away from.
Every programmer should know three key things about RMA.First, RMA is a technique that should be used to analyze any preemptive system with deadlines; it is not just for academics after all.Second, the amount of effort involved in RMA analysis can be reduced by ignoring tasks outside the critical set; non-critical tasks can be assigned an arbitrary pattern of lower priorities and need not be analyzed.Third, if interrupts can preempt critical set tasks or even just each other, RMA should be used to analyze those too. Figure 1 emphasizes all three points, by putting the role of RMA in perspective.
Glossary: Real-Time Terminology
deadline – n. The time by which a particular job must give its result.
job – n. The set of software, memory, and other operations comprising a particular decision, computation, data transfer, or combination thereof.
mutex – n. A multitasking-aware binary flag provided by an RTOS to safely protect shared resources. Short for MUTual EXclusion. The term mutex should only be used when the operating system includes a defense against unbounded priority inversion in the implementation.
period – n. The minimum length of time between the consecutive runs of a task or invocations of an ISR. May be thought of as the minimum interarrival time of a job.
preemptive scheduling – n. A style of multitasking that supports interruption of the running task so that another task that is ready to run can use the processor immediately. At all times, the highest priority task that is ready to run is guaranteed to be running.
priority – n. The relative rank of one task or interrupt compared to another. In the case of tasks, the priority is an integer assigned by the programmer, which a preemptive scheduler compares to the priorities of all that are ready to run to select one to use the processor. Note that every interrupt has a higher priority that even the high-priority task—a design “feature” of processors.
priority inversion – n. A failure of a real-time system in which a high priority task is prevented from using the CPU by a medium priority task. The key first step is for the high priority task to block waiting for a low priority task, with which it shares some global resource via a mutex. RMA only provides proofs when priority inversions are fully prevented—typically by modifying the mutex system calls.
real-time system – n. A computer system with deadlines. If a missed deadline is as bad as a wrong result, the system is a real-time system. If the consequence for a missed deadline is death or mission failure, the system is further classified as “hard real-time.” Everything in between that and a system with no deadlines is “soft real-time.”
result – n. The outcome of a job.
RMA – abbr. Short for either Rate Monotonic Algorithm or Rate Monotonic Analysis, each of which is explained in the body of this article.
RTOS – n. An operating system with priority-based preemptive scheduling. Short for Real-Time Operating System.
schedulable bound – n. The upper limit on total CPU utilization used to establish the easiest proof that a set of tasks prioritized by RMA will always meet its deadlines. The schedulable bound is based on the number of tasks in your analysis and ranges from 82.8% for 2 tasks to 69.2%. RTOS overhead such as context switch time and system calls is accounted for in the worst-case utilization analysis of the individual tasks.
shared resource – n. A singleton data structure or peripheral that is accessed by two or more tasks or a task and an ISR. Each shared resource should be protected by a mutex.
task – n. A function containing an infinite loop that begins by waiting for a signal from the RTOS, another task, or an ISR then runs a job before waiting again.
transient overload – n. A period of time during which everything goes wrong at once—causing every task and ISR to need to run simultaneously. During this time the CPU may become insufficient to meet all of the demands placed upon it.
(Michael Barr is the author of three books and over sixty articles about embedded systems design, as well as a former editor-in-chief of this magazine. Michael is also a popular speaker at the Embedded Systems Conference, a former adjunct professor at the University of Maryland, and the president of Netrino. He has assisted in the design and implementation of products ranging from safety-critical medical devices to satellite TV receivers. You can reach him via e-mail at firstname.lastname@example.org or read more of what he has to say at his blog ).