Virtualization for embedded X86 multiprocessor applications -

Virtualization for embedded X86 multiprocessor applications

Virtualization of computer hardware has been used for many decades. The most widely noted early examples are those implemented by IBM on its mainframe hardware as a means to give its customers an easy upgrade from old “iron” to new “iron.” In this case, one of the primary goals of virtualization was to allow legacy applications to run on newer machines, alongside applications designed for the new operating system and hardware.

Software that manages the virtualization environment is often referred to as a virtual machine manager (VMM) or hypervisor . A VMM creates the illusion of a hardware platform, for the purpose of hosting an entire operating system (the guest operating system) and its applications. In most cases, the guest operating system and applications running inside a virtual machine are native to the underlying processor instruction set, so that only system-level operations and I/O devices need to be emulated, not an entire instruction set of a CPU.

The typical server VMM presents a very limited virtual hardware model to the guest operating system, one that can be easily supported on a range of real hardware platforms and contains sufficient virtual I/O to support common client and server applications. In other words, your server VMM doesn't easily accommodate a variety of hardware devices, usually just a standard “white box.”

Server VMM shortcomings
The virtual I/O presented by a server VMM usually consists of a CPU, RAM, disk, video, keyboard, mouse, and a network interface. A few other generic I/O devices may be available to “install” inside the server virtual machine, but due to the difficulty associated with multiplexing a wide range of I/O among multiple virtual machines, most virtual machine I/O is limited to these basic devices. And, like a general-purpose operating system (GPOS) such as Windows or Linux, a conventional VMM must be “fair” in its approach to scheduling CPU time for each of the virtual machines and sharing the physical I/O resources among all the virtual machines.

The server VMM is targeted at solving problems for a corporate IT network: maximizing the use of server resources and simplifying the deployment and maintenance of client desktops. This approach satisfies a large percentage of IT server and desktop applications, making such virtualization platforms popular with corporate IT groups as a means to quickly buildup and teardown server applications and client desktops. Unfortunately, this approach to virtualization simply won't work for embedded applications, which typically require deterministic performance and need to interact with unusual I/O devices.

Defining embedded virtualization
When engineers were asked by Venture Development Corporation (VDC) in a recent survey what advantages they saw to using virtualization, they answered that porting to new hardware designs or integrating new software and applications onto existing platforms were the most important.1 In other words, they see virtualization as a tool to reuse existing legacy software on new hardware and to combine new features alongside existing proven software. Roughly the same number of respondents cited concerns over the ability to meet performance requirements when running software in a virtual environment.

Together, these two needs define the key factors driving the development of hypervisor products for the embedded market: a desire to support and preserve legacy code, software that has been field-proven and tested over years of use; and a need to ensure that real-time performance is not compromised, which can only be achieved by accommodating direct access to hardware.

Multiprocessor architectures
The introduction of virtualization hardware features on multicore processors provides embedded systems developers an opportunity to embrace VMM technology without having to compromise performance or their need for access to specialized I/O devices. But the server-VMM model of virtualization needs to be discarded, in favor of an asymmetric processing model.

Two basic types of multiprocessor architectures exist: symmetric multi-processor (SMP) and asymmetric multi-processor (AMP). SMP systems are based on homogeneous hardware designs, where each CPU (or processor core) has identical capabilities and full access to all I/O devices and RAM in the system. AMP systems are typically associated with heterogeneous hardware designs, where each CPU might have different features and capabilities and may even have dedicated I/O devices and RAM.

All those dual-core and quad-core machines in the latest servers, desktops, and laptops are, by hardware architecture definition, SMP machines.

The value of an SMP system, especially where general-purpose computing is concerned, is the ability to maximize the use of system resources. Having more than one CPU on which to schedule applications (or processes or threads or…) means less time waiting for high-priority CPU-intensive applications to finish.

A key advantage to AMP systems is specializing through dedicated resources. Dedicated resources, where I/O devices and RAM are only accessible to a specific core and the applications that run on that core, allow applications to assume sole ownership of the resources, with the benefit of less overhead and higher performance.

Applying AMP to SMP systems
Assigning resources exclusively to specific cores in an SMP system is another way to build an AMP system. By partitioning CPU cores, RAM, and I/O devices between multiple software systems, one can gain direct control over the performance and use of those hardware resources.

Hardware-virtualization technology, such as the Virtualization Technology (VT) found in many Intel dual-core and quad-core embedded processors, can be used for just that purpose: to partition the CPU, RAM, and I/O devices of an SMP machine between multiple virtual machines. In this embedded VMM model each virtual machine is assigned to perform a set of unique tasks, by hosting the operating system and application that is appropriate to the task.

Unlike the server VMM model, the AMP-inspired embedded VMM requires multicore processors and needs an assist from hardware-virtualization technology in the processor to ensure that each virtual machine has low interrupt latency, direct access to specialized I/O, and the assurance that the VMM will not “time slice away” the guest operating system and its applications.

Virtualization Technology in multicore CPUs
Until the availability of Intel VT (widely introduced on the Intel Core microarchitecture), software-only VMMs running on the x86 architecture encountered significant challenges. Both the VMM and the guest operating system expect to maintain supervisor-level control over the hardware platform; however, absent some form of cooperation between the VMM and the guest operating systems (aka “para-virtualization”), the VMM must resort to time-consuming and CPU-intensive “tricks.” Supervisor-level control can be reliably maintained by only one software system, resulting in a conflict between the VMM and the guest operating system. The tricks a VMM must use, without the support of hardware-virtualization technology in the processor, include modifying the guest operating system's binary code and running guest operating systems at ring levels for which they were not written.2

The downside to such VMM trickery is a decrease in performance and limited compatibility among guest operating systems. For example, binary files of a guest operating system may be modified to trap supervisor-level CPU instructions, requiring the VMM to emulate these instructions. Instruction emulation slows down the execution speed of the guest operating system, and the need to “fix up” binary files limits guest operating systems to those that have been certified for use with the VMM.

Virtualization technology built into the CPU is designed to overcome these problems. For example, Intel VT adds an overarching operating mode, called VMX root, where a hypervisor executes with ultimate control of the CPU hardware. A hypervisor that uses Intel VT can intercept key supervisor-mode operations executed by a guest operating system without requiring knowledge of the guest operating system's binaries or internals.2

A VMM for embedded
Hardware-virtualization technology has made possible the emergence of an embedded VMM (eVM) capable of supporting the demands of a real-time operating system (RTOS) while simultaneously hosting a general-purpose operating system (GPOS), such as Windows or Linux. On a dual-core machine the eVM hypervisor dedicates each guest operating system to a core, for a “dual-OS, single-platform” environment, giving developers the means to merge two disparate hardware platforms into one. Figure 1 shows an example of the TenAsys embedded VMM (eVM), which uses Intel VT to partition processor resources among each guest operating system.

View the full-size image

A key difference between this embedded VMM and the server VMM model is how physical resources are allocated to each virtual machine–paralleling the AMP versus the SMP models. Resources, such as CPU cycles, RAM, I/O, and interrupts, must be allocated by any VMM. In the simplest case, a server VMM evenly multiplexes these resources among the virtual machines, attempting to fairly distribute physical resources among all the virtual machines.

A serious drawback of the server VMM model is the heavily virtualized I/O model. Not only does virtualizing all I/O seriously limit the number and variety of I/O devices accessible with the virtual machine, but it also has a significant impact on performance and determinism.

The AMP model of resource allocation is useful where determinism and performance are more important than equal access. The processor VT features can be used to isolate resources for use by a specific virtual machine and its guest operating system rather than to create virtual I/O for shared access among multiple virtual machines.

Even in the AMP model, which is the basis of the embedded hypervisor, not all I/O is required to be exclusive. Some will be shared (such as the hard disk, enterprise Ethernet adapter, and console device). In these instances, virtual devices exist to handle the requirement to share the hardware among multiple virtual machines.

Multiple RTOS support
The application of an embedded VMM is not limited to “dual operating system, single platform” on dual-core systems; increasing the number of processor cores on a platform increases the possibilities. For example, three virtual machines could be hosted on a quad-core processor: Windows in one virtual machine running on two cores and two embedded virtual machines, each containing a dedicated RTOS on each of the remaining cores.

Take, for example, a conventional system consisting of a Windows computer serving the user-interface and enterprise nexus function, an RTOS box providing machine control, and a DSP PCI card in the Windows box dedicated to high-performance numeric algorithms, such as image processing. Using an embedded hypervisor as shown in Figure 2 , what was previously three separate (and expensive) pieces of computational hardware is condensed onto a single platform.

View the full-size image

Partitioning–determinism and performance
Granting exclusive access to I/O is essential to attaining real-time responsiveness, because it means the virtual machine can have direct physical access to its dedicated hardware. Without exclusive physical assignment of pertinent I/O, you run the risk of waiting indeterminately for access to devices. If another virtual machine has access to an I/O device, because it's multiplexed, the wait can be significant. Even if only one virtual machine ever accesses a specific I/O device, when a request is made to access that hardware a VMM that virtualizes I/O must translate the request from the virtual machine into real I/O accesses to the physical hardware, an unnecessary and time consuming process.

Exclusivity of I/O doesn't apply only to a real-time virtual machine. Graphics-intensive applications need access to real hardware for maximum performance. A virtual frame buffer may be too slow and inadequate in features for an application that renders moving 3D images. In that case, the virtual machine containing the GPOS needs direct access to the physical frame buffer and its control I/O.

Migrating legacy software
Many systems exist today that depend on code written years ago. These legacy embedded applications continue to be used because they work; they're proven, they're reliable, and they may even be certified for an application (such as medical, defense, and aerospace). Unfortunately, these legacy applications may also be running on expensive obsolete hardware or be in need of an update, such as a graphical user interface or access to an enterprise network. Rewriting proven or certified embedded code is rarely desirable or economical.

With the help of an embedded VMM, legacy code can be migrated from obsolete hardware to modern embedded platforms. Because I/O can be virtualized, it's possible to simulate old hardware devices, minimizing rewrite of proven legacy code. For example, an obsolete ISA device could be simulated within the hypervisor by intercepting I/O requests and redirecting them to equivalent on-board PCI I/O devices.

Kernel-based hypervisor
Migrating legacy systems from a dedicated hardware platform onto a dedicated CPU in a multicore processor requires more than just a real-time aware hypervisor. In the case where an I/O device must be redirected and simulated, in order to adapt legacy code to a new platform, code that runs in the context of the hypervisor must be developed and debugged. Embedded systems are, by definition, unique, and there's no way a VMM vendor can provide all the combinations of “old to new” I/O possibilities in a single off-the-shelf product.

To get around this problem, the hypervisor needs to provide a structure that enables developers to simulate hardware, either to make new I/O devices appear to be old devices or to create virtual devices for sharing between multiple guest operating systems. In this role the hypervisor can also be thought of as a sort of para-virtualization tool, with the advantage of not requiring source code to a guest operating system in order to implement para-virtualized device drivers.

If the embedded developer is to create para-virtual devices for the hypervisor then development tools are needed to write and debug those devices. Without proper tools, both in the form of a development platform and a debugging environment, it will be difficult to create the virtual hardware needed to support the migration of legacy applications to an embedded VMM.

Of course, it's important to never lose sight of the fact that direct access to real hardware is fundamental to ensuring deterministic response in any virtual machine. If all the hardware is emulated you're no better off than using a server VMM that emulates all the devices in a generic “white-box” machine.

Paul Fischer is a senior technical marketing engineer at TenAsys Corporation in Beaverton, Oregon. He has over 25 years of experience building and writing about real-time and embedded systems in a variety of engineering and marketing roles. Paul has an MSE from UC Berkeley and a BSME from the University of Minnesota. You may reach him at .

1. Venture Development Corporation report, “Operating System Virtualization in the Embedded Systems Market,” March, 2008, available for a fee at

2. Uhlig, Rich, “Intel Virtualization Technology,” Computer Magazine , IEEE Computer Society, vol. 38, issue 5, May 2005, pp. 48–56, .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.