By William E. Lamie
As discussed in
Part 1 and
Part 2 in the series, the major
implication of the extra context switches introduced by the use of
unique priorities is the additional RTOS overhead those context
switches cause. In order to measure the extra overhead introduced by
the additional context switches, we measure the clock count differences
between the cycle-boundary relinquish events (
Figure 10, below).
 |
| Figure
10 - Cycle Start and Cycle End event time stamps show elapsed time for
a complete cycle. Case-1 (Equal Priorities) is shown on the left, while
Case-2 (Unique Priorities) is shown on the right. |
(
"Click
here to view expanded image.)
Each event has a unique time-stamp taken from the system clock.
Subtracting one "RO" time stamp from the next yields the elapsed time
for the cycle. As a result of the additional context switches and
preemptions caused by the use of unique priorities for each thread in
Case-2, the application suffers increased overhead.
In this timed experiment, we use an RTOS with a 0.35?s context
switch, but in practice, the additional context switch operations can
add from 50 to 500 CPU cycles per context switch, depending on the RTOS
used. Figure 11 below shows
the results that were observed:
 |
| Figure
11 - Timing measurements show a significant increase in overhead with
unique priorities |
This experiment shows that fourteen additional context switches were
performed in Case-2, and total processing time increased by more than
80%, while the exact same number of messages were sent and received.
If context switch operations were not as fast as the 0.35?s in this
example, the impact on total processing time would be even greater. The
results here show more than an 80% increase in RTOS overhead as a
result of using unique priorities for each application thread.
The use of unique priorities might also make system performance
unpredictable. Loss of predictability occurs because the context switch
overhead varies as a result of the sequence of thread activation,
rather than in a prescribed round-robin fashion, as in Case-1.
The developer can eliminate this aspect of unpredictability by
assigning multiple threads the same priority since there will always be
a consistent number of context switches to perform a given amount of
work. If distinct priorities are assigned to each thread, then the
application developer needs to be acutely aware of the potential for
variance in system performance and responsiveness.
When to Use Unique Priorities
While use of unique priorities might result in more context switches
than running multiple threads at the same priority, in some instances
it is the appropriate thing to do. For example, if latency is more
important than throughput, in the previous example, we would want
Thread A to run as soon as a message arrives in its queue, rather than
waiting for its round-robin turn.
To make sure that happened, we'd make Thread A higher in priority
than Thread D. Likewise with Threads B and C. We would achieve lower
latency, but at the expense of cutting our throughput from 8.7 messages
per 1000 tics, to 4.8 messages per 1000 tics, as can be seen from the
following table in Figure 12 below:
 |
| Figure
12 - Table showing tradeoff between Message Processing Latency and
Message Throughput. Unique priority assignment reduces latency but also
reduces throughput. |
Priority Inheritance and Time-Slicing
Developers can best deal with the somewhat uncertain context switch
overhead caused by thread priority selection by keeping as many threads
as possible at the same priority level. In other words, only use
different priority levels when latency outweighs throughput and
preemption is absolutely required " never in any other case.
Furthermore, running multiple threads at the same priority makes it
possible to properly address other system requirements such as priority
inheritance, round-robin scheduling, and time-slicing. Each of these
mechanisms is important in a real-time system, and each can be used to
keep system overhead low and—perhaps more importantly—to keep system
behavior understandable.
Priority Inheritance is a mechanism employed by an RTOS
to prevent deadlock in a case of priority inversion. Priority inversion
occurs when a low priority thread owns a resource needed by a higher
priority thread, but the lower priority thread gets preempted by a
thread with a priority between the two.
Thus, the low priority thread cannot run, since it has been
preempted by a higher priority thread, and the resource it holds cannot
be released. The high priority thread ends up waiting for the low
priority thread to release the resource, effectively deadlocking the
high priority thread.
Priority inheritance allows the low-priority thread to temporarily
assume the priority of the high-priority thread waiting for the
resource. This allows the low-priority thread to run to the point where
it can release the resource, and then the high-priority thread can get
it and run. If all threads had to have a unique priority, the
low-priority thread could not assume the priority of the waiting
thread, since that priority is already in use.
Likewise, with a limit on the number of threads at a given priority,
it will be impossible for the low priority thread to be raised to the
level of the high priority thread if there are already the maximum
number of threads at that priority. If the limit of threads at a
priority is 1, then priority inheritance is never possible, and some
other solution to priority inversion must be found.
Round-Robin Scheduling is a method used by an RTOS to run
threads in a circular fashion, letting each thread run until it becomes
"blocked," or relinquishes its turn. In other words, none of the
threads is preempted by a higher-priority thread.
Round-robin scheduling allows multiple equally important activities
to run at the same priority, while still retaining individual
encapsulation. This approach is used in our Case-1 above, where all
four threads have a priority of 4, and run in sequence without
preemption from higher-priority threads.
Time Slicing is an RTOS scheduling method that distributes
CPU cycles to multiple threads in a weighted manner. Most commonly, it
is used to give threads at the same priority level a certain number of
CPU cycles, rotating from thread to thread and then back to the
beginning of the group continuously as long as those threads remain
active. It also can be used to allocate percentages of CPU time to each
thread.
For example, giving Thread A 25%, Thread B 10%, Thread C 10%, and
Thread D 55% of the CPU cycles while that priority is active. This is
generally achieved by dividing an arbitrary number of CPU cycles (a
"block") into proportional parts.
In this example, a block of 1000 cycles might be used (e.g., 5?s on
a 200MHz system) where Thread A executes for 250 cycles, Thread B for
100 cycles, Thread C for 100 cycles, and Thread D for 550 cycles, then
returning to Thread A for another 250 cycles and so on. This allocation
enables system designers to provide more time for threads they
determine to require more operations, but not to preempt the other
threads at the same priority.
Assigning multiple threads the same priority can have many
beneficial effects and can help system designers avoid traps that
threaten the proper operation of their real-time system. In particular,
it can reduce overhead, increase throughput, and enable priority
inheritance and time-slicing scheduling methodologies. The developer is
encouraged to use as few distinct priorities as possible and to reserve
unique priorities for those instances where true preemption is
required.
To read Part 1, go to: "The basics
of context switching."
To read Part 2, go to: "Operational
Flow " Two Examples"
William E. Lamie is co-founder and
CEO of Express
Logic, Inc.,
and is the author of the ThreadX RTOS. Prior to founding Express Logic,
Mr. Lamie was the author of the Nucleus RTOS and co-founder of
Accelerated Technology, Inc. Mr. Lamie has over 20 years experience in
embedded systems development, over 15 of which are in the development
of real-time operating systems for embedded applications.