Dr Andrew Coombes describes the challenge in using the RealogyReal-Time Architect real-time kernel and development tools to provide
a fully OSEK compatible environment.
OSEK/VDX is a set of standards for operating systems (OS) and
related services. The drive to improve system interoperability and
reduce development times has led to increasing use of software
products based on these standards in automotive electronics
applications.
The Infineon C16x family of 16-bit microcontrollers is a popular
choice in the automotive industry. This makes it an obvious candidate
for OSEK development support. However, in high-volume applications
such as automotive, any increase in processor loading or memory usage
due to additional OS overhead will impact on target system cost, and
is likely to prove unwelcome.
The challenge for LiveDevices' engineers therefore, in the design
of their Realogy Real-Time Architect real-time kernel and development
tools, was to provide a fully OSEK compatible environment, without a
costly increase in the resources required by the target system. In
particular, it was important to provide a solution that did not
require the use of off-chip memory.
The OSEK/VDX standard
The development of the OSEK/VDX standard was primarily driven by
the high recurring costs in the automotive industry of developing and
re-developing ECU (Electronic Control Unit) software, particularly
the costs of non application-specific parts of the software. A set of
commercial off-the-shelf products to do much of this non
application-specific work would allow ECU manufacturers to out-source
non-core work and so obtain overall cost reductions.
Another motivation was the incompatibility of inter-ECU
communications: different manufacturers of ECUs use different
protocols for passing information around an in-vehicle bus. The
vehicle manufacturer needs the ability to integrate a number of ECUs
from various suppliers into a single network. It is also important
that the manufacturer has as wide a choice of supplier as
possible.
The goals of OSEK/VDX are therefore to support portability and
re-usability of software components across a number of projects. This
will allow vendors to specialise in 'automotive IP, where a vendor
can develop a purely-software solution and run this software in any
OSEK/VDX-compliant ECU. But to reach this goal requires detailed
specifications of the interfaces to each non application-specific
component, and so OSEK/VDX standards include an Application
Programming Interface (API) that abstracts away from the specific
hardware of the underlying target platform and the configuration of
the in-vehicle networks.
Building an OS for the C16x
In this article, we will concentrate mainly on the major factors
that we considered in the 'minimum-footprint' implementation of a
Real-time Kernel compatible with the OSEK OS specification.
The specification defines four 'conformance classes' to allow the
OS to be scaled to the application's demands. The basic conformance
classes (BCC1 and BCC2) meet the demands of deeply-embedded
automotive ECUs, whereas the two extended conformance classes (ECC1
and ECC2) are designed to support high-end demands of systems such as
satellite navigation where resource use (RAM, ROM and CPU time) is
not so critical.
The nature of the conformance classes is such that the higher
conformance classes are supersets of the lower ones &endash; this
means that an RTOS that implements the requirements of ECC2 could
also satisfy the requirements of ECC1, BCC2 and BCC1. However, the
specification of the conformance classes was done in such a way that
an RTOS implemented specifically for a lower conformance class could
be implemented much more efficiently than one aiming to 'cover all
the bases'.
Thus the Realogy Real-Time Architect OSEK-based solution for the
C16x was initially designed to comply with the BCC1 conformance class
of OSEK OS 2.1. Tailoring the Realogy Real-Time Architect
implementation to this specific conformance class has enabled
LiveDevices to produce a solution that is ideally matched to the type
of applications targeted by the C16x family. Subsequent developments
have extended Realogy Real-Time Architect to support other
conformance classes in a scaleable way, thus ensuring that pure BCC1
applications only require a minimum level of resources.
The initial focus on the BCC1 conformance class, however, is far
from the whole story in the design of a memory and processor
efficient OSEK solution for the C16x. Running a commercial RTOS in
the resource-constrained environment typical of single chip
automotive applications, such as the C16x is used for may be
considered to be impractical in some cases, due to the memory and
processor overhead demands of a conventional RTOS.
Designing for efficiency
It may therefore come as a surprise to learn that an RTOS using
preemptive multi-tasking can provide a way to improve an
application's effective use of the CPU without incurring significant
memory overheads. In fact, with the right tool support, it is
possible for an RTOS to reduce total run-time cost while also making
application development and maintenance easier. There are three areas
where using a preemptive RTOS offers benefits over a cyclic
executive: avoiding the inefficiencies of a cyclic executive, the
ability to deal with sporadic events with a short latency and still
meet deadlines, and the benefit of separating timing and
architectural issues from the code.
Using a preemptive RTOS
The inefficiencies mentioned above occur because a cyclic
executive works by executing all of its tasks at the same frequency
or at harmonics of some given frequency. In a real-time system, each
task will have a frequency at which it must be executed in order to
meet the objectives of the system, and the frequency at which the
cyclic executive runs will be the fastest of these frequencies (also
called the minor cycle). This typically results in a number of tasks
being executed more frequently than is strictly necessary, resulting
in wasted CPU time. This effect becomes even more pronounced when
sporadic events with a short deadline need to be considered. If the
response to an event is required within one minor cycle of the event
occurring, even though the event might occur on average, once every
hundred cycles, every minor cycle needs to allow for this event to
occur. Compounded with this are the problems that are introduced when
a single task needs to be split across several cycles (for example, a
task that occurs once every 10 cycles, but requires 3 cycles worth of
execution to complete).
This may result in a sequence of minor cycles, which differ in
small respects. If this is the case, these minor cycles are
aggregated into a major cycle. An example of all of the above is
shown in figure 1.
Figure 1: Cyclic executive with four tasks and an interrupt.
Each major cycle consists of four minor cycles. Task t1 executes
every minor cycle, whereas t2 occurs every other minor cycle. The
execution of t3 needs to be split over three minor cycles. Although
the interrupt (i1) can only occur once every hundred major cycles, it
is necessary to assume that it could occur within any minor cycle
(and therefore leaving time for it is necessary).
The preemptive design of ' Realogy Real Time Architect's
OSEK-based kernel makes it possible to address all of the above
concerns. Tasks can be executed at their natural frequency, rather
than executing at some multiple of the major cycle. It isn't
necessary to waste processing power allowing time for sporadic tasks
that won't happen most of the time. Nor is it necessary to introduce
additional structuring constraints in the system (for example,
partitioning a task into three parts, so that the task fits into a
cycle). In fact, by exchanging a cyclic scheduler for Realogy
Real-Time Architect, the user is able to save a large proportion of
CPU time that would have been wasted executing cyclic components
ineffectively. It also allows the removal of the code and data space
overheads of cyclic control structures.
So now we have addressed the problems of processor and scheduling
efficiency. But what of memory usage? How can we minimise the impact
of Realogy Real-Time Architect on precious memory space?
Single shot/single stack
The OSEK-based kernel component of Realogy Real-Time Architect has
been designed and implemented in such a way that it requires minimal
memory resources in order to implement a the priority based
pre-emptive multi-tasking environment described above.
The initial focus on a basic conformance class 1 (BCC1)
implementation of OSEK OS permits the use of a single-shot execution
model, which in turn enables the use of a single stack. This differs
from the traditional RTOS model, in which the tasks need never
terminate, consequently each task needs its own permanent stack
space. The total stack required by such a system can be calculated
simply by adding up the largest amount of stack space required by
each task and interrupt handler.
In the single shot model, stack space is only required while the
task is running and once the task terminates its stack space can be
reclaimed. As a task can only ever be preempted by a task of higher
priority that must terminate before the original task can continue to
run, all tasks can share a single stack. In the single shot model
running on a single stack, the stack requirements are proportional to
the number of unique priority levels in the system rather than the
number of tasks.
By reducing the number of different priority levels in the system,
the overall stack requirements can be reduced significantly. By
optimising priority levels and exploiting the fact the single shot
model enables a single stack it is possible to reduce RAM
requirements much closer to those of cyclic systems. This enables the
benefits of preemption to be enjoyed in systems where it was not
previously possible without exceeding the on-chip XRAM provided by
the 16x family.
To illustrate the possible savings, consider a system with four
tasks: t1, t2, t3 and t4, where the priorities of the tasks range
from 1 (the lowest) for t1 to 4 for t4. If all tasks are fully
preemptive, the stack usage of the system is the sum of the stack
usage for each of the tasks t1 to t4. However, if tasks t1 and t2 are
both defined as non-preemptive, t1 can never appear on the stack at
the same time as any other task, likewise t2 can never appear on the
stack with any other task. Thus the stack usage of the system then
becomes the maximum stack usage of t1 or of t2, or of the sum of t3
and t4.
Fig 2
Improving memory usage with internal resources
In addition to the standard OSEK OS resource locking mechanism,
Realogy Real-Time Architect offers support for the internal resources
feature from the forthcoming OSEK OS v2.2 standard. This allows
automatic locking and unlocking of resources without any explicit API
calls.
If a task is declared as using an 'internal resource' Realogy
Real-Time Architect will automatically lock the specified resource(s)
immediately before starting the task, and unlock the resource(s)
directly after the task has exited. This means that during task
execution all specified internal resources remain locked until the
task exits, thus preventing any other tasks that share the same
internal resources from pre-empting the currently running task.
Internal resources offer the developer a mechanism to control task
preemption that is more detailed than simply declaring tasks as
non-preemptive.
When declaring a non-pre-emptable task no other tasks are ever
allowed to pre-empt this task. This can introduce long blocking times
particularly if the task was assigned a very low priority. Using
internal resources instead allows the possibility of specifying
exactly the tasks that must never pre-empt each other, and thus avoid
any unnecessary blocking. Any higher priority tasks that do not share
the same internal resource are still allowed to pre-empt. Therefore,
using internal resources defines logical groups of tasks that are not
able to pre-empt each other.
Internal resources can be used in an application to reduce the
number of preemption levels, thus reducing the amount of stack memory
required. If it is known that a system remains schedulable when
certain tasks do not pre-empt other tasks, you can use internal
resources to achieve non-preemptive task execution. This is
illustrated in the below diagram, which shows that even a large
number of tasks can be fitted onto a small amount of stack space.
Fig 3
Improving memory usage with 16x interrupt grouping
Because there is flexibility in the interrupt priority level for
each vector, 16x applications can be tuned relatively late in their
life-cycle to optimise the trade-off between response times and stack
usage. The 16x prioritised interrupt handling provides, for interrupt
handlers, the equivalent functionality of Realogy Real-Time Architect
internal resources for tasks. By grouping several interrupt handlers
at the same priority, it is possible to make sure that their overall
stack requirement is only the maximum of their stack usages and not
the sum total. The programmable arbitration order ('group level')
among those interrupts provides fine control when several are pending
at one time. Within the same application, any interrupt handler that
must pre-empt in order to meet its response time constraints can be
awarded a distinct, high priority.
Real-Time Architect provides support for calculating the exact
stack requirements of entire applications from the stack requirements
of individual tasks and interrupt handlers in the presence of
preemption, internal resources and interrupt grouping.
The practical benefits obtained by using Realogy RTA can be
illustrated by considering the processor and memory overheads for a
typical benchmark application on the C167, as well as interrupt
latency figures.
The data in the tables were obtained using an I+ME Evaboard C167
v2.0. The clock frequency was 20MHz. Test code and data were located
in off-chip RAM configured for zero wait states.
|
The example application is defined as
follows: There are 7 periodic and 3 sporadic tasks. The
periodic tasks have periods of 10,10,20,20,40,80 and 80ms
respectively. The sporadic tasks have a minimum
inter-arrival time of 20ms. A single ISR is used to activate
the periodic tasks using the ActivateTask call. Where a
target supports more than one ISR, they are at the same
priority level if possible. There are 4 non pre-emption
levels, implemented by autoresources.
|
|
Data Memory Usage
|
Bytes
|
|
Code Memory Usage
|
Bytes
|
|
OSEK ROM data
|
522
|
|
StartOS
(and any functions it
calls)
|
130
|
|
OSEK RAM data
|
72
|
|
osek_alarm_tick()
|
138
|
|
Stack usage (OS overheads)
|
116
|
|
ActiveateTask()
|
102
|
|
Stack usage (total)
|
142
|
|
SetRelAlarm()
|
96
|
|
|
|
|
OS internals
|
340
|
|
|
|
|
Total
|
806
|
|
|
|
|
|
|
|
Worst case CPU overheads in any 80ms
period) 1.422ms = 1.78%
|
|
|
|
|
|
|
|
|
Interval CPU times
|
Time
|
|
Notes
|
|
ISR entry latency
|
10.0µs
|
|
Time from interrupt raised to execution
of first instruction of ISR
|
|
ISR exit resume latency
|
8.8µs
|
|
Time to return from last instruction of
ISR to the interrupted task
|
|
Task entry latency
|
6.4µs
|
|
Worst case time for task entry
|
|
Task exit latency
|
8.0µs
|
|
Worst case time for task exit
|
|
Counter tick
|
10.2µs
|
|
Time to call
CounterTickXXX()
per alarm from inside ISR
|
|
Activate call
|
4.8µs
|
|
Time to call
ActivateTask()
from inside ISR
|
|
OS lock time
|
16.4µs
|
|
Longest time interrupts are locked out
due to OS execution
|
|
|
|
|
|
|
The example application is defined as follows: There are 7
periodic and 3 sporadic tasks. The periodic tasks have periods of
10,10,20,20,40,80 and 80ms respectively. The sporadic tasks have a
minimum inter-arrival time of 20ms. A single ISR is used to activate
the periodic tasks using the ActivateTask call. Where a target
supports more than one ISR, they are at the same priority level if
possible. There are 4 non pre-emption levels, implemented by
autoresources.
This single stack/single shot execution model is the key to the
high processor and memory efficiency of Realogy Real-Time Architect.
However, to take full advantage of the savings offered by the single
stack approach, it is important to determine the overall stack
requirements for the entire application in advance. Fortunately, the
timing analysis tools provided as part of the Realogy Real-Time
Architect development environment provides support for this.
This is a unique feature for Realogy Real-Time Architect, which
uses schedulability analysis, which is based upon an extended form of
DMA (deadline monotonic analysis). The most obvious feature of these
tools is to allow the developer to guarantee that all deadlines will
be met under all circumstances. This is clearly a powerful and
extremely useful feature.
However, the analysis of Realogy Real-Time Architect is not
limited solely to showing this. The analysis also shows the amount of
slack in the system as a whole, or in individual tasks, allowing for
processors to be run at different speeds and still meeting deadlines,
or adding functionality to specific tasks.
One of the most interesting features is the ability to
automatically determine which tasks can be placed into non-preemption
groups, thus reducing overall stack usage. Studies performed by
LiveDevices over a wide range of systems show that, irrespective of
the number of tasks in a system, systems rarely require more than 5
preemption levels.
The timing analysis can be applied through the development
process, from design verification based on a system model and
estimates of processing time through verification of implementation
to investigating scope for change (headroom) and practicality of
proposed enhancements or modifications.
The single-shot kernel technology offered by Realogy Real-Time
Architect complements the 'maximum performance for minimum silicon'
philosophy of the C16x microcontroller family, by extending the
concept to total system cost &endash; both hardware and software. The
small code footprint and cycle-optimising scheduler of Realogy
Real-Time Architect provide the ideal development environment for
automotive applications using the resource-efficient C167
architecture. Furthermore, being able to define the configuration of
the application ahead of deployment allows further optimisation of
memory usage.
Dr Andrew Coombes CEng, Product Development Manager, LiveDevices, a
private UK-based company with sales and distribution offices in the
USA and Europe. Founded in 1997 as a spin-off from leading edge
real-time software development work with the Volvo Car Corporation
for the S80 saloon car project. LiveDevices has financial backing
from venture capital firms and recently raised £9.2M in
development capital.
It has three product lines:
- Realogy Real-Time Architect (OSEK OS based real-time kernel
and scheduling analysis tools);
- Embedinet (small footprint, fully-featured TCP/IP stack);
- Embediserve (infrastructure services for embedded TCP/IP
devices).
Published in Embedded Systems (Europe) February
2002