A number of significant business and technical reasons are
helping drive Windows NT into the Telecom and Datacom equipment
markets. Within the last year there has been a discernible
shift in the operating systems focus for many of the major
players in these markets. For the first time this top-end of
the Microsoft operating system offering is being very seriously
considered, and in some cases already adopted, for the next
generation of products. Though Windows NT has many features
that make it an appropriate operating system for these
applications, it also has some limitations that need to be
overcome in order to make Windows NT a ubiquitous operating
system in this area. These limitations include the lack of
determinism and concerns over fault-tolerance. This article
addresses these limitations and proposes alternative solutions
to them.
Windows NT: Moving Off the Desktop
Microsoft Windows NT has emerged as the preferred OS for the
high performance 32-bit desktop as corporations seek to
standardize around a single OS. A move is under way to utilize
Windows NT not only on the desktop and for business
applications, but also for "embedded" applications such as
telecommunication equipment, medical equipment, and industrial
automation.
Until recently, these applications required higher
performance and reliability than was available in standard
desktop operating systems and hardware. Windows NT, however,
has been designed from the ground up to be a responsive,
reliable, general purpose operating system with features such
as a pre-emptive, priority-based multitasking kernel, and
built-in protection and security mechanisms. Thus, Windows NT
running on PC-compatible hardware is being utilized in more and
more non-desktop applications that in the past required
specialized or proprietary operating systems and hardware.
Real-Time Software Requirements
The performance and reliability requirements between a
business application or inventory management system and a
public telephone switch. The term "real-time" is typically used
to describe applications on the upper end of this spectrum. A
basic definition of a "real-time system" is one in which when a
response or event occurs is just as important as the logical
correctness of that response or event. High-end, or hard,
real-time systems require a high level of determinism and
performance, in other words, worst case response times in the 10s of
microseconds.
Another important requirement for real-time software is
robustness and reliability. While a programming error in a word
processing program can lower the productivity of the user, an
error in an real-time control program can mean costly
downtimes, damage to expensive equipment or even the loss of
human life. Tools and protection mechanisms must be made
available to the developer in order to minimize the occurrence
of typical programming errors such as stray pointers, memory
leaks, uninitialized variables, as well as errors in program
logic. In the event of faulty code and/or a software crash,
protection must be provided to minimize the impact of the crash
on the critical processes being controlled.
Windows NT
With its fully pre-emptive, multitasking kernel, Microsoft
characterizes Windows NT as being "soft" real-time, that is,
capable of satisfying most response requirements on average
within a given set of constraints. In reality, this means you
can expect Windows NT to miss some of your scheduling
deadlines. Whether or not this is acceptable depends on your
application, of course. There is a lot of experimentation
currently underway in which the actual limits of Windows NT are
being explored. The reasons for these limitations can be found
in a few fundamental Windows NT kernel policies and mechanisms.
Although these policies and mechanisms were put in place for
good reason, namely to optimize average system performance,
they spell trouble for the real-time or embedded systems
programmer.
As we have already discussed, determinism (the ability to
meet deadlines predictably) is an important requirement for a
real-time system, and a deterministic system can only be
developed if events can be reliably predicted. This can only be
achieved by giving the developer extensive control of the
relative priorities of all operations and events. Windows NT
restricts the developer's ability to control and predict
operations in a number of areas:
Because the Windows NT priority spectrum places all
interrupts at a higher level than normal thread execution,
user-level threads are subject to being pre-empted by any
interrupt source, regardless of its priority. This means that
even the lowly mouse can generate an interrupt that pre-empts
what to the developer is a high priority operation. Only
kernel-level threads are allowed to raise or lower the
interrupt request level (IRQL) to mask or unmask interrupts. In
many real-time operating systems, thread priorities are
interleaved with interrupt priorities, giving the developer
total control, at the application level, of the relationship
between interrupts and threads.
Windows NT provides a Delayed Procedure Call (DPC) mechanism
to increase the responsiveness of the system to interrupts. A
correctly designed interrupt service routine (ISR) minimizes
interrupt latency by performing only critical processing in the
ISR itself and queuing a DPC for later execution. These DPCs
are placed into a single FIFO, with no provisions for the
priority of the operation. This means that a low priority DPC
will execute first, regardless of the priority of DPCs queued
behind it. Although you can cheat and place a DPC at the head
of the line, this doesn't solve the problem because you may
inadvertently be deferring the execution of an even higher
priority operation that is already in the FIFO. Additionally,
since DPCs are lower on the priority spectrum than all other
types of interrupts, DPCs will not be able to execute in the
event of active interruptseven low priority device
interrupts.
In Windows NT, multiple requests for a synchronization
object (such as a semaphore or a mutex) are queued in FIFO
order without regard for the priority of the requesting thread.
Thus a higher priority thread may have to wait for a lower
priority thread to complete its operation before proceeding.
This not only affects determinism, it may also lead to priority
inversion.
Solving the classic problem of priority inversion requires
the ability to inherit the priority of another thread, which is
not available in Windows NT. The problem can be described as
follows: there must be at least three threads, A, B, and C, with
A being the highest priority and C the lowest. Let's say that C
has previously locked a resource, A is now waiting on that
resource, but C is unable to complete its job because it is
being pre-empted by B. Priority inversion has occurred because
A has been effectively held off by a lower priority thread, B.
Temporarily boosting C's priority to A's priority would remedy
the problem.
Real-Time with Windows NT
Notwithstanding these limitations, there is still tremendous
promise for Windows NT in real-time applications. There are
seven fundamental approaches that attempt to bring together,
with varying degrees of success, the advantages of Windows NT
and the demands of real-time and embedded systems
development:
- Restrict Windows NT to soft real-time applications. If
your application can handle occasional "hiccups" or delays,
you may be able to use standard Windows NT. Although the
actual window of predictability is up for debate, your
application should be able to handle timing variations in the
1-20 millisecond range with the realization that there are no
guarantees. The cost of missing deadlines should be
relatively low, and not result in a system failure or
unacceptable performance degradation.
- Create a finely tuned, highly constrained environment. By
paying careful attention to the system load, interaction with
other processes (via the network or locally), and in effect
"closing" the system, you can limit the amount of spurious or
unpredictable behavior. You may also need to write most of
your application in Windows NT's kernel mode, with the
majority of your code in device drivers. A Windows NT expert
who knows what's going on under the hood and knows where the
hidden dangers lie may be able to construct a finely tuned
system that meets your requirements, for now. However,
developing any substantial applications with such
restrictions neutralizes many of the benefits of an OS such
as Windows NT and will result in code that is difficult to
support and maintain.
- Provide a Win32 API wrapper around an RTOS. This approach
does not leverage Windows NT at all, but rather provides an
alternative API to an existing real-time operating system.
Standard Windows NT applications cannot be utilized with this
approach, limiting your options for future expandability.
Also, since the target system is not Windows NT, you are
forced to use specialized tools for compilation and
debugging.
- Couple a real-time operating system with Windows NT. In
most cases, this means running Windows NT and a real-time
operating system (RTOS) on separate systems. For this
approach, the Windows NT system is used only for the operator
interface and other non-real-time functions. The dedicated
RTOS system is used for the actual real-time control. This
scenario requires you to learn and maintain two separate
development environments and also increases the cost of the
total system by requiring at least two computers. Running the
two operating systems on the same system eliminates the extra
hardware costs, but still requires two separate development
environments.
- Modify the Windows NT kernel. Because Microsoft does not
license the source code to the Windows NT kernel to third
parties, this is an option that is only available to
Microsoft. Because of their focus on the broader OS market,
indications are that these types of modifications will not be
coming from them.
- Modify the Hardware Abstraction Layer (HAL). The HAL is a
layer of code between the Windows NT executive and the
hardware platform that hides hardware-dependent details such
as I/O interfaces and interrupt controllers. Microsoft
routinely grants HAL source code licenses to hardware vendors
who need to do special adaptations for their hardware to run
Windows NT. Microsoft has also granted HAL source code
licenses to various companies, including RadiSys, for
products that attempt to extend Windows NT with real-time
capabilities. Performing extensive modifications to the HAL,
such as manipulating the clock or rewriting the way in which
interrupts are processed, represents an unprecedented,
unproven use of the HAL, creates a non-standard environment,
and may pose serious maintainability challenges. Even more
importantly, because the "non-real-timeness" of Windows NT is
rooted in basic Windows NT kernel mechanisms, modifying the
HAL can only result in slightly improved soft real-time
performance. As long as the standard Windows NT executive is
used to schedule and process threads and interrupts, hard
real-time determinism is not possible.
- Complement the standard Windows NT kernel with a
real-time kernel. Any solution that claims to bring hard
real-time performance to Windows NT must provide an alternate
kernel to handle real-time task scheduling and execution,
running in conjunction with the standard Windows NT kernel.
In fact, the three major solutions to real-time Windows NT
available on the market today have taken this approach.
Introducing such a kernel into the Windows NT environment,
however, may actually decrease system reliability unless the
kernel is at least as reliable as the Windows NT kernel. It
is critical, therefore, that the real-time kernel be proven
in real-life applications, with extensive testing that can
only come through repeated usage over time. Other important
considerations are memory and address space protection, as
well as the ability to survive catastrophic system failures
(Windows NT "blue screen" crashes). Finally, clean
integration with the Windows NT environment, for example
leveraging the same development tools and APIs where
possible, is critical for the general usability of the
solution.
Integrating a Real-Time Kernel
There are at least two alternatives for coupling a real-time
kernel with the Windows NT kernel: place the kernel inside a
Windows NT interrupt service routine or device driver, or place
the kernel outside of Windows NT's address space.
At first glance, putting a real-time kernel inside a Windows
NT interrupt service routine (ISR) or device driver is the most
straight-forward and easiest to implement approach. However,
with such an approach the user is forced to develop real-time
applications in the Windows NT kernel mode (as opposed to user
mode, the "normal" development mode). In the Windows NT kernel
mode, code has privileged access to the entire memory space,
including the Windows NT kernel and other device drivers, with
no address isolation or memory protection offered. Thus, a
real-time thread could easily overwrite the address space of
another process, including other real-time processes. Because
these types of programming errors are typically extremely
difficult to detect and result in spurious but critical
failures, achieving reliable operation often requires extensive
testing and debugging, with many errors not detected until
after a system has been deployed in the field. Writing a
complex, multithreaded real-time application in this privileged
mode is contrary to the programming model espoused by Windows
NT.
Equally serious is the challenge of maintaining reliable
operation of the real-time kernel in the event of a Windows NT
blue screen crash. By definition, when Windows NT crashes
something catastrophic has occurred such that Windows NT itself
cannot recover. The integrity of all of Windows NT is in
question, including interrupt handling, the operation of all
device drivers and HAL services. Continued operation of a
real-time kernel that is encapsulated within the Windows NT
kernel space will be unreliable at best, and will likely lead
to the crash of the real-time processes.
INtime for Robust, Seamless Integration with Windows NT
RadiSys has developed an approach to real-time Windows NT
called INtime that provides robust, seamless integration with
Windows NT. INtime extends the usage of Windows NT to
applications that require real-time performance and
mission-critical reliability. These applications can take full
advantage of Windows NT's standard user interface, network
capabilities, development tools and off-the-shelf software and
still deliver rock solid performance of critical real-time
tasks.
Through a unique combination of proven real-time technology
and seamless integration with Windows NT, INtime makes it
possible to extend Windows NT applications into the real-time
world. INtime applications consist of non-real-time Windows NT
processes and threads, and real-time processes and threads.
Real-time processes typically handle time-critical I/O and
control, while non real-time processes handle the human
interface, network communication and data storage.
Figure 1: INtime architecture
INtime consists of:
- Real-Time Kernel
The real-time kernel, based on the proven iRMX operating
system kernel, provides deterministic scheduling and
execution of real-time threads. Real-time interrupts and
active INtime threads immediately pre-empt the execution of
any Windows NT threads and disable all non-real-time
interrupts.
- RT API
Real-time threads access the capabilities of the real-time
kernel via a Win32*-extension real-time application
programming interface (RT API). To develop real-time
applications, you use standard Windows NT development tools,
including Microsoft Visual C/C++ Developer Studio, "Wizard"
extensions (for real-time processes), and a Windows NT-based
real-time dynamic debugger.
- NTX Driver
The NTX driver is a Windows NT device driver that provides
centralized support for the OSEM. The NTX driver facilitates
communications between real-time kernel threads and Windows
NT threads.
- NTX API
The NTX API extends the Win32 API to enable non-real-time
threads to communicate and synchronize with real-time
threads. Mechanisms such as semaphores, mailboxes and shared
memory are provided.
- Patented OS encapsulation mechanism (OSEM)
The OSEM manages the simultaneous operation and integrity of
the Windows NT kernel and the real-time kernel, and provides
memory protection and address isolation between processes for
added reliability and robustness.
- Modified Windows NT Hardware Abstraction Layer
(HAL)
INtime includes a special version of the Windows NT HAL that
improves the overall reliability and robustness of the
system.
Field-Proven Real-Time Technology
The INtime real-time kernel is a small, efficient real-time
multitasking executive that utilizes the features of the x86
Intel architecture to achieve high reliability and performance
as demonstrated in thousands of deployed applications
worldwide. The kernel provides objects for communication and
resource access control, schedules threads on a pre-emptable,
priority basis, and services interrupts in a responsive,
managed fashion. It contains a number of low-level APIs for
super-high performance, as well as standard APIs for high
performance inter-thread communications, synchronization and
memory management.
The INtime kernel supports 256 priority levels with
round-robin scheduling supported within each level. Application
threads and interrupt handlers share the same priority
structure, allowing priority inter-mixing between handlers and
application threads. The INtime kernel also supports a full
complement of inter-process communication and synchronization
mechanisms including data and object mailboxes, counting
semaphores, access-controlled regions and timer management.
Windows NT Development Environment
The INtime developer uses standard Windows NT development
tools, including the Microsoft Visual C/C++ Developer Studio
and its integrated debugger. RadiSys also provides a dynamic
debugger that runs under Windows NT and is fully aware of the
INtime real-time constructs and the real-time API. Using these
two debuggers, developers can simultaneously view and debug
real-time and non-real-time threads.
INtime provides a set of real-time application and device
driver "wizards," integrated with the Microsoft Developer
Studio, for faster development of real-time applications and
device drivers. The wizards guide the developer through the
design decisions required when developing a real-time
application and generate the corresponding code fragments.
Real-Time Extensions to Win32 API
INtime provides two sets of APIs to extend the standard
Win32 API with real-time capabilities. The Real-Time API (RT
API) provides direct access to the real-time kernel for the
development of real-time threads. (In order for a thread to be
real-time, it must use the INtime RT API. Introducing standard
Win32 APIs into this thread would compromise the real-time
responsiveness of the thread.) The RT API provides a set of
functions to manage threads (creation, deletion, priority
modification, suspend, resume, and so on); allocate and share memory
between processes; share objects between processes; manage
real-time semaphores and mailboxes; provide access-control for
critical resources; and handle exceptions and interrupts.
The NT Extensions (NTX) API enables non-real-time Win32
threads to communicate and synchronize with real-time threads.
Win32 threads that utilize the NTX API may synchronize with a
real-time thread (in other words, a thread that uses the RT API) through
real-time semaphores. Communication may occur through real-time
mailboxes as well as via a shared memory interface.
Patented Technology Improves Fault Tolerance
RadiSys' patent-pending OS encapsulation mechanism (OSEM) is
responsible for the simultaneous operation of Windows NT and
the INtime kernel on the same CPU and provides real-time
responsiveness regardless of Windows NT activity. This approach
utilizes standard Intel architecture support for hardware
multitasking to maintain proper address space isolation and
protection between non-real-time Windows NT processes and
real-time processes.
In a standard Windows NT configuration, the bulk of the OS
runs in the confines of a single hardware task. Additional
hardware tasks are normally only defined to handle catastrophic
software induced failures such as stack faults and double
faults where a safe, known environment is required from which
to handle the failure. INtime transparently creates a hardware
task for the real-time kernel and manages the switching and
execution of the standard Windows NT hardware task and the
real-time hardware task. This approach guarantees the integrity
of both the Windows NT kernel and the real-time kernel, and
enables the successful operation of real-time processes even in
the event of a total Windows NT failure. It is this mechanism
that adds a new level of fault tolerance to Windows NT. By
putting critical processes under the control of INtime, these
processes are guaranteed to continue operation through any
failure of the Windows applications in the system or even a
failure of Windows NT itself.
The OSEM encapsulates the entire Windows NT priority
spectrum in the lowest INtime real-time priority level. This
ensures that real-time threads and interrupts will always have
priority over Windows NT threads and that the end system will
operate deterministically, regardless of Windows NT
activity.
Because the OSEM provides a separate, protected environment
for the real-time processes, INtime users are relieved of the
burden of writing code in the Windows NT kernel mode space. The
result is improved reliability and robustness, as well as
simplified programming and debugging. For each real-time
process created on top of the INtime kernel, a separate 32-bit
protected memory segment is automatically created. This segment
is separate and distinct from those used by Windows NT and
provides address isolation and protection not only between
real-time processes, but between real-time processes and
non-real-time Windows NT code. This memory protection is
provided automatically to the INtime developer, using standard
Windows NT compilers (such as Microsoft Visual C/C++).
Finally, the OSEM provides a clean, well defined interface
that minimizes interaction with Windows NT to a few key areas,
resulting in improved product reliability and simplifying
compatibility with future Windows NT releases.
Modified Hardware Abstraction Layer (HAL)
RadiSys provides a number of small changes to the standard
Windows NT HAL in order to improve the overall reliability and
robustness of the INtime system. These changes are:
- Trap
attempts to modify the real time clock and time-of-day clock so
that the real-time kernel can control the system time base and
synchronization of the time-of-day clock
- Trap attempts to
assign Windows NT interrupt handlers to interrupts reserved by
the user for INtime real-time use
- Ensure that
interrupts reserved for INtime real-time kernel use are never
masked.
Summary
When looking to Windows NT as a potential platform for
telecom systems, it is critical to assess what the application
will demand in terms of reliability and determinism, both now
and in the future. Making an informed decision requires
understanding some fundamental mechanisms of Windows NT and how
they affect the utility of the operating system in a real-time
environment, and the basic approaches to solving this
problem.
By utilizing proven real-time technology and seamless
integration with Windows NT, RadiSys has made reliable
real-time Windows NT a reality. Corporations will now be able
to utilize industry standard Windows NT across the entire
organization, from desktop and business applications to
high-end embedded applications such as telecommunication and
data communication equipment, and all the way down to the
factory floor. These applications can take full advantage of
Windows NT's standard user interface, network capabilities,
development tools and off-the-shelf software and still deliver
rock solid, reliable performance of real-time tasks.