Understanding virtualization facilities in the ARMv8 processor architecture
Virtualization facilities in ARMv8-based systems play a special role in these systems and consist of several components. While ARMv7 had a special CPU mode to run a hypervisor as an extension, in ARMv8, it has become a part of the architecture, and it has been integrated into the privilege-level system under the name EL2. At the same time, this mode only solves problems associated with the CPU accessing system resources, such as memory and peripherals. To improve the efficiency of transactions initiated by devices in a virtualized environment, a number of components have been developed for ARMv8-based systems, such as new interrupt controllers and IOMMUs. This article provides an overview of these facilities from the perspective of system software development.
Virtualization in ARMv8-based systems is organized as shown in Figure 1: the EL2 privilege level runs a hypervisor controlling the execution of virtual machines’ (VM) code and sharing of resources between them. The levels of EL1 (OS kernel, privileged code) and EL0 (unprivileged code) are left for VM instances. Address translation is performed in two stages (Figure 2): in the first stage, a so-called intermediate physical address (IPA) is calculated from a virtual address using first-level translation tables (pointers held in TTBR0_EL1/TTBR1_EL1 registers); in the second stage, the real physical address is calculated using the second-level table prepared by the hypervisor (the pointer is stored in the VTTBR_EL2 register). Such an organization provides effective privilege separation and isolation of VMs from the hardware. This allows, for example, one to have many instances of an identical VM.
Fig 1. Virtualization in ARMv8-based systems (Source: Auriga)
The two-level translation allows VMs to maintain their own translation tables while also allowing the hypervisor to fully control the final results. The EL2 privilege level is designed specifically to execute hypervisor code and has some differences from the other levels. Thus, it is the minimum privilege level where the special registers VTTBR_EL2, VTCR_EL2 are accessible as well as a number of others intended for VM management.
In the original version of the ARMv8 architecture, only one translation table is provided for the hypervisor, and another is provided for the current VM. The hypervisor has access to several special registers through which configuration parameters visible to VMs at the EL1 level are set, such as CPU identifiers (manufacturer, version, etc.) and the multiprocessor system ID. This allows one to expose VMs running on the same system to different topologies of virtual SMP systems and CPUs from different versions and manufacturers.
If an event requiring hypervisor intervention happens in the VM, its processing is performed as follows:
an exception occurs at the EL2 level;
according to its type, the appropriate handler is called from the table (the address is stored in the VBAR_EL2 register);
necessary actions are performed;
if needed, required values are put into registers;
the hypervisor returns to the VM where the exit occurred (or switches to another VM if the hypervisor is designed accordingly).
Fig 2. Address translation performed in two stages (Source: Auriga)
Events for which such VM exit exceptions occur are defined by the HCR_EL2 register bits. Thus, these can be system register accesses, including those available at the EL1 privilege level (e.g., TTBR0_ EL1/TTBR1_EL1, FAR_EL1), cache and TLB flush instructions, regular exceptions (interrupts, including those from timers and unsupported operation codes), and interrupt and event waiting instructions. Two-stage address translation enablement is also controlled by this register. In addition, a separate hardware timer is available at the EL2 level, which allows a hypervisor to configure a periodic interrupt, usually used to initiate VM switching, similar to the way tasks are switched in modern OSs.
The switching process also includes saving of the current VM context, loading of a new VM, and transferring of control to it. At the same time, VMs can perform hypervisor calls in a way resembling the way unprivileged code at the EL0 level performs system calls. To perform such a call, the VM places parameters in registers and executes the "hvc" instruction. This results in an exception at the EL2 privilege level that is processed in a standard way. Typically, this occurs when calling standardized PSCI protocol functions.
It should also be mentioned that the hypervisor can intercept calls from VMs to the trusted code routines (e.g., PSCI in non-virtualized environments is implemented there, and calls to it are processed at the highest privilege level, EL3). The ARMv8 architecture also contains additional facilities to improve the performance of virtualized environments: in addition to the shareability domains that the hypervisor can assign to reduce cache coherency traffic, each VM can be assigned its own identifier or VMID. Its use makes it possible to avoid the “expensive” TLB flush when switching VMs.
The original version of ARMv8 provided 8-bit identifiers that were later extended to 16 bits. In addition, in ARMv8.1, the second translation table for the EL2 level, TTBR1_EL2, was added as a part of VM host extensions so that the hypervisors of Type 2 (which were part of the host OS) had more possibilities. At the same time, as mentioned above, fully featured virtualization requires VMs to interact with peripheral devices (network adapters, storage controllers, etc.) with minimal hypervisor involvement as well as delivery interrupts from devices to processors.
System memory management unit
These aspects of virtualized environments in the ARMv8 systems are handled by two units: the generic interrupt controller (GIC) and the system memory management unit (SMMU) (Figure 3). SMMUs perform translation of I/O addresses in the same way as it is done for CPU-initiated memory accesses. The unit supports the one- and two-stage translation of I/O addresses. Due to this, the benefits of translation and protection of memory areas can be used in VMs as well as in the hypervisor. Hence, devices are allowed to read/write only to/from specific memory address ranges.
Fig 3. The system memory management unit (SMMU) (Source: Auriga)
Moreover, it is sometimes convenient to organize scatter–gather operations on I/O buffers by means of the SMMU. The usage model of translation stages is almost the same as that for the CPU cores (i.e., the output of the first stage produces an IPA unique to the current VM, and the output of the second stage produces the real physical address unique to the entire system). The format of SMMU translation tables is similar to that for the CPU, with some differences in page attributes. Page sizes of 4, 16, and 64 KB are supported as well as one or two translation tables, depending on register settings and the translation stage, and the full 48- or 52-bit address space.
Each involved device has its own translation context (which ultimately selects the associated translation table set). It is possible to share a single context among several devices. Context selection is performed by the unit using the so-called Stream ID, a hardware-dependent device identifier. Thus, for PCIe devices (physical or virtual functions), RID serves as an identifier that replicates the device address in the PCIe configuration space. SMMUs have their own TLBs and support VM IDs for acceleration. In case of incorrect configuration detection, translation errors, and other exceptions, SMMUs assert so-called context interrupts (i.e., interrupts bound to translation contexts).