The architecture of ARMv8-based firmware systems
Since its release in 2011, the ARMv8 processor architecture has become quite widespread in the mobile device market. According to the forecasts of the ARM Limited CEO, the processors of this generation will acquire a world market share of up to 25% by 2020. It is natural enough that the software support was established and has been developing further by inheriting the features and general principles of the historically formed infrastructure.
A fundamentally different situation is observed in the server segment of the market. X86-based servers have been dominating this area for a long while, while ARMv8 is just finding its way (and only into specific business segments). The novelty of this market for ARM and the fact that most of the accepted standards and specifications (primarily ACPI and UEFI) have not been adapted for ARM systems until recently has left its mark on the development of the software infrastructure.
This article focuses on an overview on the ARM-based server system and processor features and makes no claims of being an exhaustive description. The authors would also like to draw the reader’s attention to the fact that provided data can quickly become obsolete – soon enough, new processors will come with new technical solutions that may require a different approach to the implementation of the software infrastructure.
First, we should point out that the current implementations of firmware for ARMv8 server systems consist of several relatively independent components. This gives a number of advantages, such as the possibility of using the same components in both the server and embedded systems’ firmware, as well as the relative independence of introduced changes.
So, what modules and components are used in these systems, and what are their functions? The overall chart for the loading and interaction of modules is shown in Fig. 1. The process begins with the initialization of subsystems, such as RAM and interprocessor interfaces. In current implementations, this is executed by a separate module in the EL3S mode immediately after switching on the main CPU power. Thus, this component of the system has the maximum possible privileges. It does not usually interact with the OS directly.
Fig 1. The loading and interaction of modules. (Source: Auriga)
Later, the control is transferred to the next component, most often the ARM Trusted Firmware (ATF) module, which is executed in the same mode. ATF control can be transferred either directly from the level 0 loader described in the previous paragraph or indirectly through a special UEFI module that implements the PEI (PreEFI Initialization). ATF consists of several modules that receive the control at different times.
The BL1 start module performs the initialization of the platform parts assigned to the secure processor mode. Since ARMv8-based systems use hardware separation for trusted and non-trusted resources, including RAM, the BL1 module prepares an environment where the trusted code can be executed. In particular, this type of initialization includes the configuration of memory/cache controllers (trusted and non-trusted zones are marked through the programming of the registers in these devices) and marking of on-chip devices (energy-independent memory controllers). This markup also introduces the filtering of DMA transactions on the basis of device types (trusted/non-trusted). Given all this, memory writing/reading is possible only to/from areas whose security settings match those of the device. Implementations of a trusted environment can be quite complex; for example, they can include a separate OS. However, the description of such implementations is beyond the scope of this article.
The BL1 module configures the MMU address translation table, as well as the exception handler table, where the most important element is the exception handler for the Secure Monitor Call (SMC) instruction. At this point, the handler is minimal and can actually only transfer control to images loaded into RAM. While running, the BL1 module loads the next stage (BL2) into RAM and transfers control to it. The BL2 module works in the EL1S mode with reduced privileges. Therefore, the transfer of control to this module is performed using the “ERET” instruction.
The purpose of the BL2 module is to load the remaining firmware modules (BL3 parts) and transfer control to them. The reduced privilege level is used to avoid possible damage to the code and EL3S data already in the memory. These parts’ code is executed by calling the EL3S code located at the BL1 stage using the SMC instruction.
The third stage of the loading and initialization of the ATF can consist of three stages, but the second stage is usually omitted. Thus, in fact, only two remain. The BL3-1 module is part of the trusted code that is accessible to general-purpose software (OS, etc.) in runtime. The key part of this module is the exception handler called by the “SMC” instruction. There are functions in the module itself for implementing standard SMC calls: the code that implements the standard PSCI interface (designed to control the entire platform, such as enabling/disabling processor cores, platform-wide power management, and rebooting) and also handles vendor-specific calls (providing information about the platform, managing embedded devices, etc.).
As mentioned above, the presence of the BL3-2 module is optional; its code (in the case of a module) is executed in the EL1S mode. Usually, it serves as a specialized service/monitor for the events that occur during platform operation (interrupts from certain timers, devices, etc.)
In fact, BL3-3 is not an ATF module, but an image of firmware executed in the nonsecure mode. It usually takes control in the EL2 mode and represents an image of either a bootloader similar to the widely known U-Boot or that of a UEFI environment, which is standard for server systems.
The overall chart of ATF module initialization is shown in Fig. 2.
Fig. 2. ATF module initialization. (Source: Auriga)