Wanted: industry standards for benchmarking embedded VMM hypervisors - Embedded.com

Wanted: industry standards for benchmarking embedded VMM hypervisors


As enterprise computing hardware has become more powerful, largely by virtue of the high-performance multicore products coming out from Intel and AMD, virtualization has become increasingly popular in IT as enterprise software companies have focused on developing ways to use this hardware more effectively both for server consolidation and centralized provisioning of virtual desktop computing environments for employees. This same trend is now moving into the embedded world, as both single and multicore processors become sufficiently powerful and cost effective to support multiple simultaneous applications. The most straightforward way to maximize the utility of the underlying hardware platform is exactly the same as in the enterprise computing case: virtualization.

The impact of hypervisor technology on the overall performance of a system may vary widely, regardless of the performance capabilities of the underlying hardware. Therefore, it's increasingly important to establish a standard method to benchmark different types of hypervisor software in a consistent and repeatable way. The Embedded Microprocessor Benchmark Consortium (EEMBC) is developing such a method that will help system developers choose the best hypervisor for their application and implement the most optimal configuration parameters for their platforms. This method will enable developers to test various critical performance metrics such as interrupt latency and context switching times in different scenarios–including different software workloads and numbers of processor cores–for both micro benchmarks and full-up application-level benchmarks on fully provisioned operating systems.

The essence of a hypervisor
At its most basic level, a hypervisor, also known as a virtual machine monitor (VMM), is virtualization software that allows multiple operating systems or execution environments to run simultaneously on a single physical CPU. It guarantees complete isolation between the virtual machines (VMs) running above it and also guarantees isolation between itself and those VMs.

A hypervisor is not itself a virtual machine. Rather, the hypervisor allows the creation of multiple virtual machines that run independently on a single core or on multiple cores. These VMs in turn can host Java VMs or other operating systems such as Linux, a real-time operating systems from commercial vendors, or even a home-grown scheduler or thin executive.

The key for the embedded systems market is that the hypervisor allows different operating systems to run simultaneously and in isolation from one another on a single common device. While a powerful enabler of a wide variety of technologies in single-core applications, the technique has particular strength when applied to multicore devices as it allows a simple mapping of system resources and permits existing legacy code to run on one processor and operating system while new applications can be written for the new, more complex, environment. This flexibility also enables developers to make use of new, more power-efficient hardware and still reuse their existing tried and tested code.

Benchmarking hypervisors
To ensure a level playing field for the benchmarking process, all the benchmark suites and kernels must be executed in the CPU's unprivileged execution environment so that any required system calls must be made through the hypervisor's API. This is required to meet the basic security definition of a hypervisor.

All the benchmark kernels that are used as the workloads must be executed in isolated virtual machines with the scheduler alternating execution among the virtual machines at least every 10ms. This is required to measure the impact of the VM switching time.

But there are no restrictions on the CPU implementation. It can be single-core, multicore, multithreaded, or any combination. What is necessary is that the CPU must have at least two privileged modes of operation to provide isolation of virtual machines from the hypervisor. This also means the CPU must have a MMU or MPU to enable VM-to-VM isolation, which limits the range of single processors that can be assessed. This is not an assessment of the processor, the multicore implementation, or the communications links themselves, all of which can have an impact on the performance of the software. Rather, the benchmarks are designed to test the implementation and performance of a particular hypervisor on a particular processor or multiprocessor device.

The aim of the benchmarks is to provide comparisons on the same processor and in the same software environment and to confine themselves to demonstrating the overhead associated with the hypervisor implementation.

EEMBC's testing environment uses three different scenarios to test the systems in different ways and ensure that the assessment is reasonable across a wide range of applications, from the basic task running without the hypervisor to several virtual machines running. These results are then combined to provide the overall score.

Scenario #1 runs each of the workloads sequentially, to be measured N times without the hypervisor and stores total execution time as RESULT#1 and the power consumption as ENERGY#1. In scenario #2, each of the workloads runs sequentially N times in one virtual machine with the hypervisor, giving a total execution time as RESULT#2 and power consumption as ENERGY#2. Scenario #3 sees three separate virtual machines each running the workloads N times with all other parameters for the workloads the same as in scenarios #1 and #2, providing RESULT#3 and ENERGY#3.

The benchmark is calculated from the difference in execution time between Scenario 1 and 2 as the percent overhead for a single virtual machine, and the difference in execution time between Scenario 1 and 3 as the percent overhead for multiple virtual machines.

It's vital to stress that the EEMBC hypervisor benchmark scores can only be compared for hypervisors running on identical systems, from exactly the same CPU and board with the same clock frequencies, cache/memory configuration and speeds, to the same compilers, linkers and libraries, and, of course, the identical versions of the software. These scores can also only be compared with the same number of iterations, types, and order of workloads, as initial testing has shown that variations can make significant differences.

Surprising findings
Some of the early evaluations have revealed surprising results. The implementation of the hypervisor makes a significant difference to the workload's operation and the system's efficiency. With some hypervisors, the number of workloads that a hypervisor can support can plateau quite rapidly, indicating that adding more processes is not cost or power effective.

Moving forward
Several key issues exist for hypervisors running on multicore devices, including load balancing (or rather dynamic resource allocation, shifting the resources between cores while in operation), debugging the operation across multiple cores with synchronized break points, and the way the cores themselves and the processes communicate. This set of EEMBC's benchmarks does not specifically address these issues, and the operation of multiple work loads across multiple cores doesn't allow for dynamic changes in resources or look at the communications, nor does it show how easy these systems are to work with.

Eventually benchmarking will cover the interactions between multiple cores on a single chip and document how easy a hypervisor is to implement and use; the need for this data is inspiring systems developersto push for standards at the software level that allow the same hypervisor implementation to be used across different processor architectures. Once achieved, this flexibility will further drive the reuse of applications across new devices and provide higher performance and lower power consumption for the next generations of embedded systems.

Markus Levy is founder and president of EEMBC. He is also president of The Multicore Association and chairman of Multicore Expo. He was previously a senior analyst at In-Stat/MDR and an editor at EDN magazine but began his career in the semiconductor industry at Intel Corporation, where he served as both a senior applications engineer and customer training specialist for Intel's microprocessor and flash memory products. He is the co-author of Designing with Flash Memory. You may reach him through www.eembc.org .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.