Are we therefore likely to see versions of Xen, KVM or the VMware hypervisor on embedded devices, or should embedded-systems developers even try to port one of the open source solutions themselves? The answer is 'no' for good technical reasons. First, let's look at some of the virtualization use cases, which are considered for embedded systems.
Embedded virtualization use cases
While there are literally dozens of actual and potential use cases for virtualization in embedded systems, they can be broadly broken into three main categories: co-existence of different operating-system (OS) environments on the same platform, isolating critical components from an untrusted OS environment, and the use of an indirection level for remote control of OS environments on deployed systems.
These are all based on the fundamental idea of virtualization: inserting an additional software layer --the hypervisor--between an OS and the hardware. An OS running on a hypervisor does not access real hardware resources, but virtualized resources provided by the hypervisor. This is a generalization of the idea of virtual memory to all system resources. The immediate advantage is that a single hardware platform can be shared between several OS environments, as shown in Figure 1.

From this bird's-eye view there is no difference between embedded- and enterprise-style virtualization. However, the differences emerge when we dig a bit deeper.
Why run several OSes concurrently?
In the enterprise space, one of the main drivers is quality of service (QoS). Different services can be protected from interference (eg. a highly-loaded service degrading a lightly-loaded one) by running them on different machines. Replacing physical machines by virtual machines allows this QoS isolation to happen on a single platform, with significant cost savings.
This is not normally a consideration in embedded systems, where different sub-systems all contribute to the overall system mission. There are, however, classes of embedded systems that are much more like enterprise-style servers, such as high-end network infrastructure devices. This kind of systems may indeed be served adequately by an enterprise-style hypervisor. This class of system, however, will be ignored in the remainder of this article.
The typical reason for using multiple OSes in an embedded system is "horses for courses." Modern embedded devices frequently contain massive amounts of software, measuring in the millions of lines of code (LoC). This implies a need for high developer productivity, which creates the demand for operating systems with familiar and high-level APIs, Consequently, "rich OSes," such as Linux or Windows, are becoming development platforms of choice in the embedded space.
While good for user-facing software, these rich OSes do not provide an appropriate base for other embedded software, in particular real-time subsystems. These are both functionality and legacy problems. Rich OSes do not tend to have the low and highly predictable interrupt latencies required by embedded systems, and frequently many real-time subsystems are large and difficult to port to APIs provided by rich OSes.
The inevitable consequence is a requirement for co-existence of several very different OSes on the same platform. This is accomplished by running each OS on a separate processor. However, in order to properly isolate the two systems from each other, physical memory needs to be partitioned either by connecting the processors to different memory banks or by having some other mechanism for partitioning shared physical memory.
Virtualization allows this to be done on a single processor core, which frequently results in lower BoM cost. But it is actually useful even on a multicore setup since the hypervisor partitions the shared physical memory between OSes without a need for further hardware mechanisms. What is particularly appealing is the abstraction of the underlying architecture provided by the hypervisor. The system designer can develop the software stack independent of the number of cores and can use the hypervisor to map the same stack to uni- or multi-processor hardware. This "architectural abstraction" is shown in Figure 2.

This can be achieved by running the PIN-entry code directly on the hypervisor (akin to running on "naked" hardware), as shown in Figure 3. This may keep the TCB of the PIN-entry code as small as 15 kLoC.

Other variants of the isolation theme are protecting the code that implements digital rights management or sensitive proprietary code. It may also help to reduce certification cost: A particular subsystem (say a baseband stack in a mobile phone) can be certified independently of the rich OS environment, if the hypervisor guarantees strong isolation. As the rich OS environment , which supports the user-visible functionality, tends to change much more frequently than the software implementing the basic communication protocols, this can lead to significant cost savings.
Examples of remote control are firmware-over-the-air (FOTA) upgrades and remote disablement (eg. wiping sensitive contents from a stolen phone). While these can be implemented by other means, the indirection level provided by the hypervisor simplifies implementation (and therefore reduces cost) as well as reducing downtime for FOTA. If sufficient resources can be made temporarily available, the new version of a subsystem can boot up while the old one is still running (See Figure 4), enabling a very fast switchover.

Requirements for hypervisors
From the above discussion of virtualization use cases, we can derive a number of requirements for hypervisors that can be used in a resource-constrained embedded system.
First, the hypervisor must support the processor architecture used in the embedded system. Unlike the enterprise space, where the x86 architecture is pervasive, many embedded systems use other kinds of processors. Specifically, the mobile wireless space is dominated by ARM cores; in other verticals the MIPS and Power architectures are prevalent.
The basic scenario of co-existing rich and real-time OS implies first off that the hypervisor must be real-time capable, meaning that it has bounded and short interrupt latencies.
Isolation between virtual machines is, by definition, provided by any hypervisor. However, while in enterprise systems strong isolation is the prime motivation for the use of virtualization, this is not so in embedded systems. All the various subsystems of an embedded system contribute to its overall functionality, and therefore require strong cooperation. This implies a need for low-overhead, low-latency, high-bandwidth communication channels between subsystems (i.e. virtual machines).
The hypervisor must also not consume significant resources, as this would drive up the BOM cost. Furthermore, the security use cases (PIN entry and IP protection) require that the system is highly resilient not only to software, but also hardware-based attacks. This means that all security-critical code (which includes the hypervisor) must be contained in on-chip memory. This puts very severe restrictions on the size of the hypervisor. Similar arguments can be made for the device-management use cases.
Can enterprise hypervisors do the job?
This issue is difficult to illustrate for proprietary systems whose source code is kept secret. Fortunately some of the most widely deployed hypervisors are open-source systems, and therefore provide an excellent base for comparison. In the enterprise space these are Xen and KVM, and in the embedded space OKL4, a member of the L4 microkernel family. While these three systems do not cover the whole design space, they are representative enough for our purpose. Xen actually straddles both spaces, as it has recently been adapted for embedded use. How do these systems fare with respect to the requirements we have identified above?
Let's start with KVM. The approach taken is that it turns the complete Linux kernel into the hypervisor, by activating the x86 hypervisor mode. Such a special execution mode does not exist on present ARM or MIPS processors, and KVM cannot be used on those, ruling it out of much of the embedded space (although hardware support for virtualization is bound to appear on such processors in the future).
Finally, KVM does not provide the real-time capability required for many embedded systems. While real-time support in Linux has made significant progress over the years, it is not a suitable replacement for an RTOS in many cases. If it were, why would one run an RTOS concurrently with a rich OS in the first place? In summary, KVM doesn't look like a promising hypervisor for embedded systems. How about Xen?
A port of Xen to ARM and Power have been done at Samsung and IBM respectively; presumably a MIPS port (if it doesn't already exist) would be possible too. However, Xen was not designed for real-time use. The Linux experience shows that it is hard, expensive and time-consuming to retrofit real-time support into a system that was not designed for it. Xen is unlikely to be an exception.
How about communication? The authors of Xen stated that fast inter-VM communication was not a design goal, so it would have to be retrofitted. This brings up memories from a related area--early microkernels suffered from poor communication performance and attempts to improve it failed. Jochen Liedtke, the creator of the L4 microkernel, finally showed that fast communication must be designed into the system from the beginning. Any attempt to beef up inter-VM communication in Xen is likely to go through the same experience.
How about size? The Samsung port of Xen to ARM fits into 2MB. This is impressively small compared to the typical footprint of an enterprise-style hypervisor, but it is still far too big to fit into on-chip memory. Furthermore, Xen relies on a privileged VM, called dom0, to provide device drivers. That domain contains a complete Linux guest OS, and is part of the system's TCB. On ARM, dom0 adds another 14MB to the memory footprint. Hence, even a significantly slimmed-down Xen hypervisor is still huge by embedded standards.
Now let's look at the hypervisor designed for embedded systems. Like Xen, OKL4 supports a number of architectures, including ARM, x86 and MIPS (related L4 ports support Power). The system's real-time capability is demonstrated by its use in an estimated 250 million mobile phones.
The memory footprint of OKL4 is less than 64kB, more than an order of magnitude less than Xen's, and certainly more in line with the resources of typical embedded systems. Specifically, this is small enough to fit into on-chip RAM on many SoCs, leaving some space for application code.
What may be the most impressive comparison is the size of the source code. Xen is of the order of 100 kLoC, plus all of Linux in dom0, which puts the total into many 100s of k-LoC, while all of OKL4 is around 10 kLoC. The Samsung researchers report that for the port of Xen to ARM they had to modify or add 23 kLoC. This is more than twice the size of the complete ARM source of the OKL4 hypervisor! If a port is twice the size than doing it properly from scratch, why bother?

Conclusions
Enterprise and embedded domains have strongly differing characteristics and this is reflected in the requirements put on virtualization solutions in the two domains. A one-size-fits-all approach that is implied by porting an enterprise hypervisor to embedded systems will short-change embedded-systems designers. Embedded systems need their own optimized solutions.
About the Author
Gernot Heiser, cofounder of Open Kernel Labs, is the technical leader of the firm. Prior to co-founding OK, Dr. Heiser created and led the Embedded, Real-Time and Operating Systems (ERTOS) research program at NICTA. He can be contacted at gernot@ok-labs.com.