The system virtual machine can be used to make Linux-based applications faster and more responsive and secure. Here's a primer to get you started.
Linux is rapidly becoming the operating system of choice in a wide array of embedded applications, ranging from mobile handsets and network/telecom infrastructure apps to media-rich consumer electronics devices such as portable media players and digital video systems.
Many embedded systems developers are already using Linux in their designs or are considering doing so. The perception among developers is that it's easier to develop applications for Linux than it is to develop for proprietary operating systems and that using Linux reduces costs because it's open source.
However, Linux still presents a number of problems in the embedded space.
First, the nature of the programming environment is often bifurcated. For example, in many media-rich consumer applications, Linux is used to run high-level application code that is similar–and often identical–to application code used on personal computers. Such code is typically developed by application programmers who normally are not experts in programming low-level embedded systems.
But such applications have much real-time functionality that requires low and predictable interrupt latency. In the case of the mobile phone terminal, the cellular communication subsystem has real-time requirements. And while embedded Linux has certainly improved, these requirements are best met by a small and highly efficient real-time operating system (RTOS).
Second is the problem of security. In a mobile-phone handset, for example, the communication stack is of critical importance–if it is subverted by an attacker, the phone could be turned into a jammer that disables communication in the whole cell. Similarly, an encryption subsystem needs to be strongly protected from being compromised.
It's no insignificant challenge to create a secure system that runs millions of lines of code; inevitably, the code contain tens of thousands of bugs, many of which can compromise the system's security. Increasingly prone to attacks, embedded Linux implementations are large enough (hundreds of thousands of lines of code) to contain as many as a thousand bugs. Because the Linux operating system normally runs in privileged mode, once it is compromised, attacks on any part of the system are possible.
Third is the issue of license separation. Linux is a frequently deployed high-level operating system. Among its advantages are the royalty-free status, independence from specific vendors, widespread deployment, and a strong and vibrant developer community.
A frequent concern about Linux is that it's distributed under the GPL license, which requires that all derived code is subject to the same license and thus becomes open source. Some legal arguments claim that the license applies even to device drivers that are loaded into the kernel as binaries at run time.
This restriction creates a potential problem for chipmakers who consider device interfaces valuable proprietary IP. An open-source device driver will effectively publish those device interfaces, a strong disincentive for using Linux in many embedded systems scenarios.
Virtualization to the rescue?
As is the trend with desktop applications developers, many embedded systems developers are looking to the use of system virtualization environments, also called system virtual machines, to resolve, or at least minimize, such problems.
Unlike process virtual machine environments specific to particular programming languages, such as the Java VM, system virtual machines correspond to actual hardware and can execute complete operating systems in isolation from other similar instantiations in the same computing environment.
This article will explain embedded-system virtual machine models and explores where and how they can be used to make Linux-based applications faster and more responsive and secure.
The basics of virtualization
Virtualization refers to providing a software environment in which programs (including operating systems) can run as if on bare hardware, as Figure 1 shows. Such an environment is called a virtual machine . A virtual machine is an efficient, isolated duplicate of the real machine.
The software layer that provides the virtual machine environment is called the virtual machine monitor (VMM) , or hypervisor . The VMM has three essential characteristics:
1. It provides an environment for programs that is essentially identical to the original machine;
2. Programs that run in this environment show, at worst, minor decreases in speed; and
3. The VMM is in complete control of system resources.
All three characteristics are important and contribute to making virtualization highly useful. The first (similarity) ensures that software that runs on the real machine will run on the virtual machine. The second (efficiency) ensures that virtualization is practical from the performance point of view.
The efficiency feature requires that the vast majority of instructions be directly executed by the hardware: any form of emulation or interpretation replaces a single virtual-machine instruction by several instructions of the underlying hardware.
This requires that the virtual hardware be almost identical to the physical hardware on which the VMM is hosted. Small differences are possible, such as the physical hardware may miss some instructions of the virtual hardware (as long as they aren't heavily used), the memory-management unit may be different, or devices may differ.
However, not all instructions can be directly executed. The resource-control feature requires that all instructions that deal with resources must access the virtual rather than the physical resources. This means such instructions must be interpreted by the VMM, as otherwise virtualization is violated.
Specifically, the virtual machine must interpret two classes of instructions: (1) control-sensitive instructions modify the privileged machine state and therefore interfere with the hypervisor's control over resources; and (2) behavior-sensitive instructions access (read) the privileged machine state. While these instructions can't change resource allocations, they reveal the state of real resources, specifically when they differ from the virtual resources and therefore break the illusion provided by virtualization.
Benefits of virtualization
The key attraction of virtualization for embedded systems developers is that it supports the concurrent existence and operation of multiple operating systems on the same hardware platform.
Virtualization helps overcome the challenges caused by the bifurcated programming environments by running appropriate operating systems concurrently on the same processor core, as shown in Figure 2. The same effect could be achieved by using separate cores for the real-time and application software stacks, combined with hardware mechanisms for partitioning memory.
The ability to run several concurrent operating systems on a single processor core may reduce the bill of materials, especially for lower-end devices. It also provides a uniform operating system environment in the case of a product line (composed of high-end devices using multiple cores as well as lower-end single-core devices).
Virtualization can also be used to enhance security. A system virtual machine encapsulates a subsystem, so that its failure can't interfere with other subsystems (note that this encapsulation is courtesy of the aforementioned resource-control requirement).
Types of virtualization
There are two basic ways to ensure that code running in the virtual machine doesn't execute any sensitive instructions: 1) pure virtualization , which depends on sensitive instructions not being executable by the virtual machine and 2) para-virtualization , where sensitive instructions are removed from the virtual machine.
The classical approach is pure virtualization, which requires that all sensitive instructions be privileged. Privileged instructions can be executed in a privileged state of the processor (typically called privileged mode, kernel mode, or supervisor mode) but generate an exception when executed in unprivileged mode (also called user mode). This is shown in Figure 3. An exception enters privileged mode at a specific address (the exception handler), which is part of the hypervisor. The hypervisor can then interpret (“virtualize”) the instruction as required to maintain virtual machine state.
Until recently, pure virtualization was impossible on almost all contemporary architectures, as they all featured sensitive instructions that were not privileged. Recently many of the major processor manufacturers–including Intel and AMD in the desktop space and ARM in the embedded market–have added virtualization extensions that allow the processor to be configured in a way that forces all sensitive instructions to cause exceptions.
Despite this, there are a number of reasons this approach to virtualization is not generally used, especially in embedded applications. One is that exceptions are expensive.
On pipelined processors, an exception drains the pipeline, resulting in delay in processing, typically one cycle per pipeline stage. A similar delay typically happens when returning to user mode. Furthermore, exceptions (and exception returns) are branches that usually are not predictable by a processor's branch-prediction unit, resulting in additional latency.
These effects typically add up to 10 to 20 cycles, more in deeply pipelined high-performance processors. Add to this the work required for the actual instruction emulation, we can see that virtualizing a single instruction costs dozens of cycles. Some processors (notably the x86 family) have exception costs that are much higher than this (hundreds of cycles). This creates substantial overhead for operating system's code, which frequently executes many privileged instructions in a short time.
In this approach, the source code can be manually modified to remove direct access to privileged state and instead replace such accesses by explicit invocations of the hypervisor (“hypercalls”).
Para-virtualization allows replacement of many sensitive instructions by a single hypercall, thus reducing the number of (expensive) switches between unprivileged and privileged mode.
If properly implemented, para-virtualization has the potential to reduce the virtualization overhead. Variants of para-virtualization have been deployed for years by VMWare and Xen from the University of Cambridge, both aimed at the enterprise market. Recently virtualization solutions aiming at embedded designs have emerged, such as L4/Wombat from the University of New South Wales, and the commercial systems Trango, VLX from VirtualLogix as well as OK4 from Open Kernel Labs.
All have their advantages and disadvantages in different environments, but what most of these approaches all have in common is that they introduce another layer of software–and complexity–into the operating environment. And in many embedded applications, where code size frequently measures in the millions of lines of code, breaking this into two or three virtual machines is of limited help for improving overall system reliability and security.
The isolation provided by current approaches to para-virtualization is by its nature coarse-grained–it provides the appearance of a complete machine for each subsystem. This means that each virtual machine is required to run its own operating system, making them relatively heavyweight.
Consequently, increasing the number of virtual machines in order to reduce the granularity of the subsystems would create serious performance issues and significantly increase the amount of code. This in turn not only requires increased memory size and thus power consumption, but also results in more points of failure.
Using a microkernel as a hypervisor
The most appropriate way to deploy para-virtualization in an embedded design is by integrating it into the structure of the operating system. But as Linus Torvalds commented recently, integrating it into Linux is not practical because there is no one-size-fits-all “One True Virtualization” model appropriate to all the applications in which Linux is being used. It also wouldn't help for those deploying devices without Linux (because they want to offer a choice of high-level operating systems, or maybe because they ship low-end devices in which a high-level operating systems makes no sense).
A better place to implement an integrated para-virtualization mechanism is in the companion RTOS used to handle the hard real-time operations that Linux can't. This approach requires more than a bare hypervisor: it needs a kernel that can provide basic operating-system mechanisms, such as a high-performance message-based microkernel, as Figure 4 shows. Unlike the classical monolithic operating system structure of Linux with a vertical structure of layers, each abstracting the layers below, a microkernel-based system, exhibits a horizontal structure. System components run beside application code and are invoked by sending messages.
A notable property of a microkernel system is that as far as the kernel is concerned, there is no real difference between “system services” and “applications”–all are simply processes running in user mode. Each such user-level process is encapsulated in its own hardware address space, set up by the kernel.
A subsystem running on a microkernel can only affect other parts of the system (outside its own address spaces) by invoking kernel mechanisms, particularly IPC. It can only directly access memory (or another resource) if it's mapped into its address space via a system call.
This model is a good fit for embedded systems, where the distinction between “system services” and “applications” is frequently meaningless due to the cooperative nature of the interaction of subsystems.
The central mechanism provided by a microkernel is a message-passing communication mechanism called IPC. In the horizontal system structure, IPC is used for invoking all system services, as well as providing other communication between subsystems. Owing to the need for high-bandwidth, low-latency communication, a microkernel typically also provides mechanisms for setting up shared memory regions between processes.
In this context, a microkernel provides the right mechanisms for efficiently supporting virtualization. The microkernel serves as the hypervisor, which catches virtualization traps. Contrary to other virtualization approaches, the microkernel forwards the exception to a user-mode virtual machine monitor, which either performs the emulation or signals a fault.
The IPC is also the enabler for low-overhead virtualization: a system-call trap executed by a guest application in a virtual machine invokes the microkernel's exception handler, which converts this event into an IPC message to the guest operating system. The guest handles it as a normal system call. The system-call result is returned to the guest application via another IPC message, which unblocks the waiting guest process.
Similarly, IPC is used to deliver interrupts to the guest operating system's interrupt handler. It's also used to communicate with device drivers, and for communication and synchronization between any components of the system, including between virtual-machine environments.
As the same IPC mechanism is used for many different operations, it's typically highly optimized. This implicitly benefits virtualization as well as other critical system operations. As a well-designed IPC mechanism is also very simple, it's possible to optimize it in virtually all of its aspects.
A practical application
To see how this more fine-grained approach to virtualization has an impact on how an embedded systems designer works, let's look at a media player design, originally hosted on a more traditional para-virtualization environment with Linux as the guest operating system. The design is then ported to run in its own address space as a native application under the open source, but commercially supported, message-based OKL4 virtualization-ready microkernel.
The media player can then run side-by-side with the Linux system (that still supports other applications) but also with a trusted crypto service that runs in a minimal trusted-computing-base environment. Over time, more components can be extracted from their monolithic environments, be it a high-level operating system or an RTOS running a communications stack, into their own protected compartments.
This includes device drivers, network stacks, file systems and other functional components. Such an approach can dramatically improve the robustness of the system by introducing internal protection boundaries that contain the damage caused by bugs.
For more than 10 years, L4 has been successfully used as a hypervisor for virtualizing Linux. As shown in Table 1, the performance of open-source OKL4-based virtual machines is typically within less than 5% of the native performance.
A particularly interesting result is that of Linux on ARM platforms, where OK Linux (Linux virtualized on OKL4) outperforms native Linux in Imbench context-switching and other microbenchmarks by factors of up to 50.
Gernot Heiser is cofounder and chief technology officer of Open Kernel Labs (OK). Prior to founding OK, he created and led the Embedded, Real-Time and Operating Systems (ERTOS) research program at NICTA, the Australian national centre of excellence for information and communications technology. He is also a professor at the University of New South Wales. Gernot Heiser holds a PhD in computer science from ETH Zurich, Switzerland. You can reach him at .