Getting real (time) about embedded Windows by using virtualization -

Getting real (time) about embedded Windows by using virtualization


Do-it-yourself (DIY) solutions that add real-time event responsiveness to embedded PCs have been deployed for decades. DOS and Windows-based systems have long been attractive to embedded system developers because they have given them easy access to files, networks, and user interfaces and, most importantly, inexpensive hardware.

But the days when a PC OS can be easily “tweaked” to allow response to real-time events are gone. System complexity driven by the proliferation of multi-core hardware and operating system security mechanisms is building barriers that are no longer easily overcome by use of a simple interrupt handler or clever device driver.

Since the wide-spread adoption of the general-purpose operating systems (GPOS) such as Microsoft Windows, the most common way to service real-time events has been to write a custom device driver that performs time-critical processing within an interrupt subroutine driver.

This is because, in an environment such as Windows, the only way to ensure that time-critical application code is executed on time without interruption is to execute the code within a device driver. Interruptions and non-deterministic processing delays otherwise occur as Windows services user input actions and network activity, along with scheduling normal applications.

In the past, Microsoft even encouraged the development of special OS subsystems (a special class of kernel service), and real-time device drivers. To make it easier to build these subsystems and device drivers, Microsoft made available specifications and source code for the OS's hardware abstraction layer (HAL) so designers could build a custom HAL in order to produce their own real-time extensions to Windows.

Unfortunately, HAL source code has been generally unavailable for many years, so the creation of specialized kernel subsystems and hardware platforms for Windows is now a very difficult path to travel. Likewise, the process of simply building a device driver has gotten more difficult over the years.

Device drivers must now be certified, or signed, and must conform to ever stricter rules to be acceptable as executable kernel objects. This has been done primarily to ensure security and stability in the OS; and with each iteration of Windows, the process has become increasingly difficult and less flexible.

The end may be very near for the practice of developing device drivers and using them as real-time extensions inside Windows. There are two reasons for this:

First, it has always been a dicey proposition to do real-time processing in a device driver. The more complex that real-time processing becomes, the more difficult it is to complete those tasks within the limited framework of a device driver, and the more likely other system services will suffer as a result. Complex device drivers often result in a system that is error prone.

For example, it may be impossible to collect large amounts of data, analyze that data, and make a proper control decision in a timely fashion, within the context of a device driver. Pushing the complex task into a deferred procedure call (DPC) simply aggravates the scheduling problems that forced the use of a device driver solution in the first place!

As a case in point, consider a spectrographic test and measurement device that scans a welded steel assembly, looking for manufacturing defects, in real-time. A testing application such as this exerts a stimulus that results in reams of data in response.

Analyzing the resulting data stream, for an indication of something that is not right, involves an intense and complex algorithm to process the large amounts of data that can not possibly be done reliably from within the limited context of an interrupt handler or device driver embedded in the Windows kernel.

A separate, dedicated, real-time OS (RTOS) is needed to serve the process reliably. Because of this, when high functionality with real-time determinism are both required in an embedded application, many designers have responded by designing systems based around multiple processing subsystems (Figure 1 below ).

Figure 1. The need for increasing functionality while preserving real-time responsiveness, coupled with more stringent GPOS security measures, have driven some embedded system designers to transition to higher-cost multi-platform solutions.

Security enhancements frustrate driver development
A second and more frustrating reason that is now limiting the device driver approach to adding real-time capability to a Windows system is the heightened issue of security. Each evolution of the Windows OS has taken ever-increasing care to lock-out malicious software that might take control of the machine or change the behavior of the operating system. Real-time device drivers will increasingly be treated as intruders by the Windows OS. Windows will flag drivers that “misbehave” as being suspicious.

For example, one way to arouse suspicion is when the processor spends too much time executing a device driver, to the detriment of other system services. Once recognized as suspicious, the OS may take efforts to expel the code in order to protect the security of the system. In some cases, the OS will even attempt to “heal itself” by reinstalling previously “trusted” versions of a subsystem, so that attempts to “hook” key kernel subsystems are thwarted.

Even if the real-time performance requirements of a custom-built device driver are modest, and the driver is “well-behaved,” code that was developed and proven in the past may not work in a system with a new OS version or a new service pack.

For example, security enhancements incorporated into Windows Vista include driver signing, PatchGuard (Microsoft's kernel patch protection mechanism which can shut down the system if Windows discovers that its own code and data systems have been patched by an unauthorized source), and code integrity checks. On 64-bit Windows platforms, Microsoft doesn't allow older 32-bit device driver code to be used at all.[1]

Besides security-related issues, functional optimizations incorporated into new GPOSes make building real-time applications harder. For example, Windows Vista introduces two new types of I/O priorities (priority on individual I/O operations and I/O bandwidth reservations) in order to give foreground I/O operations preference.[2]

It also implements memory access priorities.[3] These features perform scheduling tricks that modify the priority of a driver and its associated threads in order to improve the desktop user experience.

Though these changes make the overall utilization of the system “more fair” and give the user a “better experience,” these mechanisms work against the development of a reliable and deterministic real-time system.

Hard real-time systems need to be unfairly biased in favor of the real-time applications. This need further justifies having a separate real-time operating system running outside of the Windows kernel in order to guarantee that real-time application elements are operating at a priority that is meets the application needs.

The deployment of Windows Vista, the upcoming release of Windows 7, and the widespread adoption of multi-core platforms mark that point in time where clever device driver and other kernel-mode solutions can no longer be used to successfully implement a real-time solution for Windows. In fact, these realities have let many embedded system OEMs to stick with delivering older versions of Windows on their products in order to avoid dealing with the new OS complexities.

Even if a real-time device driver complies with all the rules enforced by the OS, there are still many problems with this kernel-level programming approach that make for a very difficult development and debug environment, which translates into code that is very costly to develop and maintain. [4] For example:

* Blocking functions may not be called when operating at a high priority level or a deadlock might occur.
* Locks cannot guarantee mutual exclusion for interrupt handlers.
* High priority code cannot access pageable memory, because page faults are fatal.
* Passing addresses between kernel-level and user-level are difficult or impossible to work with.

Coding and debugging at the user-level simplifies development because the above kernel-level rules do not apply, and complexity is reduced. Added to that is the fact that there are many more tools and languages available for programming at the user-level.

Multi-cores are not solutions in themselves
Even the added processing power that comes from being able to run multiple application threads at the same time on a multi-core processor will frustrate attempts to simply “throw more CPU cycles at solving the real-time responsiveness problem.” Determinism can not be ensured through higher-performance or multi-core solutions alone; a device driver solution still must live by the scheduling whims of Windows. This is because Windows on multi-core systems utilizes an unbiased symmetrical multiprocessing (SMP) scheduling algorithm, not a biased asymmetric scheduling policy; meaning that it uses basically the same “fairness” algorithms it applied to one-core systems on multiple cores.

And multi-platform solutions are costly
Multi-platform approaches, like that shown in Figure 1, are expensive because there is excess hardware involved: multiple circuit cards, multiple memories, multiple power conditioning circuits, etc. These factors make the systems more expensive to design, manufacture, test, and more costly to service in the field.

For example, in the ultrasound process testing application described above, a separate hardware subsystem, such as a separate intelligent I/O card, could be used to capture and process the scanned data in real-time. But an I/O add-in card that could do the real-time processing might cost as much or more than the PC itself!

A second factor that argues against the multi-platform approach is the realization that embedded systems based on multiple platforms and multiple operating environments often require different software development tools for the real-time and general purpose subsystems, making software development and maintenance more difficult and expensive.

Real-time virtualization on multi-cores is the answer
A better approach to delivering the high functionality that is required of today's embedded systems, while holding costs down, is to leverage the new multi-core processors in a new way.

Rather than using the conventional SMP approach of letting a GPOS-based operating environment manage the deployment of the processor cores, a better solution for deterministic embedded systems is to dedicate one or more cores to real-time processing and the remaining cores to the GPOS and its applications. In effect, use the multi-core silicon to do what would have required a high cost multi-platform, multi-OS implementation (Figure 2 below ).

Figure 2. Advances in silicon technology, coupled with virtualization of OS software, enable multi-platform embedded systems to save costs by being implemented on a single multi-core platform. Legacy single-processor embedded applications can also be moved to multi-core platforms to gain functionality and avoid the limitations imposed by security-driven GPOSes.

The secret to making this work is virtualization of the operating environments. Virtualization allows multiple operating environments to share a single hardware platform as though each operating environment had exclusive use of it. In the world of office servers, virtualization reduces the cost of computing hardware by allowing a smaller number of server boxes perform the work of many. An embedded system can derive the same benefit.

In server systems virtualization is being applied to the new generation of multi-core processor chips to allow multiple copies of a GPOS (such as Windows) to share the I/O interfaces on the platform. But this solution cannot ensure real-time determinism.

In the embedded system world, virtualization must be employed using different tactics. With real-time virtualization, a mixture of real-time and general purpose operating systems can be hosted on a multi-core-based system simultaneously, with each OS given exclusive use of I/O resources on the platform. Real-time applications coexist alongside Windows rather than working inside the Windows kernel, thus avoiding the appearance that the real-time software is trying to subvert Windows security or reliability.

In an embedded system, real-time virtualization is managed by a virtual machine manager (VMM), or real-time hypervisor. The key to providing real-time performance with this embedded VMM model is knowing how to partition the hardware interfaces. It's not sufficient to simply restrict processing of different OSes to different cores.

Some VMMs virtualize the entire I/O interface, imposing a layer of indirect processing on each I/O operation that slows down system responsiveness. Others selectively and permanently assign I/O resources to the different virtual operating environments so that real-time responsiveness to I/O events is maximized., with a real-time OS and Windows running side-by-side, as peer OSes on the system. Windows kernel security is maintained and hard real-time determinism is achieved.

Multi-core platforms with virtualized operating environments also provide new opportunities for embedded system designers to add functionality to their systems while preserving investments in legacy software.

It's no longer a requirement to rewrite applications and host them on new operating systems in order to add functionality to embedded products. Legacy operating systems and legacy application code can be hosted along side new OSes with new features and functionality on the same multi-core processor chip.

A case in point is control system for a CNC machine that performs control functions competently but needs a new human-machine interface (HMI) with “soft PLC” capability. Rather than attaching a PC to do the HMI functions, or rewriting the real-time code for the new OS, the legacy control functions and state-of-the-art Windows based HMI can be hosted on different cores of the same processor.

Multi-core processors such as the Intel Architecture family have an advantage when it comes to holding down the cost of embedded system development, because with some virtualization software environments, real-time code and general-purpose applications can be developed using the same tools, such as Microsoft Virtual Studio.

As this article suggests, advancing technology, such as the ever more stringent security measures taken by the new operating systems, will impact the way engineers have done things in the past. The cost for running into these roadblocks is too high and they are coming to fast. It's time for embedded designers to get off the old treadmill and leverage the new technologies.

With multi-core processors and the right embedded VMM software solution, developers can mix and match their operating environments and gain the benefits of both real-time optimized OSes and full-featured general purpose OSes, while simultaneously preserving their investments in legacy functionality.

[1] Conover, Matthew, Symantec Corporation, Assessment of Windows Vista Kernel-Mode Security,
[2] Russinovich, Mark, Microsoft Corporation, Inside the Windows Vista Kernel: Part 1,
[3] Russinovich, Mark, Microsoft Corporation, Inside the Windows Vista Kernel: Part 2,
[4] Swift, Michael, University of Wisconsin, Madison, Reinventing Drivers,

Kim Hartman is VP of Sales & Marketing at TenAsys, serving the embedded market with HW analysis tool and RTOS products for 25 years. Kim has recently been a featured speaker for Intel and Microsoft on the topic of embedded virtualization. He is a Computer Engineering graduate of University of Illinois, Urbana-Champaign and degreed MBA professional of Northern Illinois University.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.