Virtualization, formerly mostly at home in data centres and enterprise computing infrastructure, is now spreading to embedded systems,driven by cost and security concerns. ARM, the dominant architecture of high-end processors for mobile devices, is not trap-and-emulate virtualizable.
This means that virtualization of ARM processors requires binary translation or para-virtualization. Binary translation is generally too resource intensive for mobile devices, which is why all known commercial and research hypervisors for ARM use this approach.
Para-virtualization has the drawback of high engineering cost for having to adapt each supported operating system (OS) to the hypervisor-specific platform interface. In the server world, which is dominated by the virtualizable x86 architecture, the virtualization boom has lead to hardware extensions to support virtualization.
These allow running an unmodified, native OS binary in a virtual machine (VM) withminimal performance degradation, and greatly simplify the implementation of hypervisors and reduce run-time overheads. The same is now happening with embedded processors. For example ARM has added virtualization extensions for their architecture, along similar lines as the manufacturers of x86 processors.
In this paper we present the first hypervisor which uses these extensions to support pure virtualization on ARM, and is able to run multiple concurrent unmodified Linux guests. We report on our experience with using the new extensions. Unfortunately, only extremely limited performance evaluation is possible, as the hardware extensions are presently only available in a simulator which is not timing-accurate.
We used the ARM Fast Modelssimulator, which models the RealView emulation baseboard plus the virtualization extensions. The simulator is efficient yet functionally very accurate, but it is not timing-accurate. Also, it does not model complex external devices such as Ethernet, so we only support simple devices such as the UART consoles, the LCD controller, and the on-board timers.
Our prototype comprises 5,730 LOC. We estimate that VM priorities would add at most 500 LOC, while multicore support would add about 1,000 LOC. This is offset by removing about 1,600 LOC of unneeded functionality from the executive we used as the starting point of our implementation, so a fully-fledged hypervisor would still be less than6 kLOC.
To read this external content in full, download the complete paper from the NICTA.org online