Performance, security, portability, and software consolidation on a single platform are key factors driving the demand of multi-OS multicore designs in many embedded market segments, including industrial/medical, mobile, and automotive. Broadly speaking, these designs can be categorized as homogeneous or heterogeneous computing domains.
Homogenous computing is characterized by similar processing units (in terms of instruction set architecture) controlled by a single instance of an OS that can handle all the resources on the platform. Symmetric multiprocessing OSes are an example of this, and were widely deployed in the early days of mutlicore adoption in the embedded industry. However, embedded systems are diverse and it is impossible to satisfy the majority of requirements/constraints with a homogeneous multicore design. This is where heterogeneous computing comes into the picture, enabling multiple software stacks running on sets of core(s) suited to a perform a particular function.
Heterogeneous multicore computing can be further classified into supervised and unsupervised multicore processing. The supervised class covers designs with manager software mediating between multiple software stacks while the unsupervised class has a manager-less design where software running on one set of cores might assume the ‘master’ role and setup work for rest of the processing units.
Frequently ‘heterogeneity’ of the system comes from diverse software stacks and not necessarily from processing units. What this means is that two similar cores, one running Linux and another running an RTOS, would still classify as a heterogeneous multicore design, although they are running on the same instruction set.
This article describes a resource partitioning scheme for a supervised, heterogeneous multicore embedded system, where the system under consideration contains multiple instances of embedded Linux, each running on a different set of cores. After first reviewing the tools available currently for resource partitioning among multiOS systems, along with their limitations, a partitioning algorithm is proposed that is used to produce a filtered view of the platform for the guest operating systems in the system’s hypervisor. The resulting work flow is explained with the help of a real-world use case.
The need for resource partitioning
Whether supervised or unsupervised, heterogeneous computing introduces a significant resource partitioning challenge. Consider an unsupervised design where multiple OSes on a single platform need to run in tandem, but where each OS is allowed access to only one set of devices. Or alternatively, a supervised design with a hypervisor supporting multiple guest operating systems in which the hypervisor and guests all can have potentially different views of the hardware platform on which they are running.
Flattened device trees (FDTs). While there are several ways by which embedded software can gather hardware information, flattened device trees  are fast becoming the preferred way to provide fast enablement of Linux on hardware. A major breakthrough in the adoption of device trees has been their inclusion in Linux kernel 3.2 for the ARM architecture .
With the growing popularity of FDTs it is natural to look at the data contained within one and see if it can be processed to satisfy resource partitioning requirements. The idea is to take a master device tree completely describing the hardware platform and convert it into multiple independent device trees that would supply restricted hardware views to multiple associated operating systems.
FDTs are represented by device tree structures (DTS) as a convenient textual representation of the platform in the form of a tree. It is possible to hand edit these DTS files, inserting/removing data as desired, while driving new slave device trees. But this manual hand-editing process is prone to errors. For example, what happens if a device assigning to a particular OS needs to be assigned to another one? This would require changes in more than one place and recompilation of all the device trees touched.
Automating resource partitioning
What is necessary is a program that auto-generates the device trees according to the requirements of the design under consideration. In that context, a good thing about FDTs is that they come with excellent support in the form a device tree compiler (DTC) and runtime library (libfdt) to manipulate FDT data . Using these tools, one can write a utility for resource partitioning among multiple OSes by generating new device tree structures/blobs for each OS in the system. Of course, this requires some additional metadata to be specified covering the design requirements. This can be done by extending the master DTS describing the platform for a regular single OS design.
Figure 1 is a flow diagram for a supervised heterogeneous multiOS design with this tool in action. It shows a hypervisor-based design in which platform information for virtual machines (hypervisor guests) is obtained from a master device tree structure. The tool also extracts relevant information about virtual machines required by the hypervisor. Although not shown here it is easy to extend this flow to generate platform information in any format, thus supporting OSes that don’t employ device trees.
An algorithm for FDT partitioning
Resource configuration based on FDTs can be accomplished with the following steps:
Annotate the master DTS with additional information about multicore, multiOS system. There are two parts of this annotation:
- First in the form of new bindings where the additional information is kept under a new node defined at the base of the tree. This information covers design specifications such as how many OSes are present in the system, their memory partitioning, cores on which they run, and the devices allocated to them.
- The second part is the labeling of nodes in the master device tree in case such labels are missing. This is required for referencing master DTS nodes in the newly defined configuration node.
Decide which nodes in the master DTS should be retained for each OS. This step is the bulk of the work as it requires finding the dependency sub-trees. Additionally, there might be some mandatory nodes that need to be included for the output to be considered a valid device tree for that platform, so those need to be marked up as well, along with the dependency sub-trees.
Copy the master DTB to a buffer per OS. Run a filter removing all unmarked nodes from the copied DTB.
For supervised heterogeneous designs, additional nodes such as virtual devices might be required in the guest DTBs. Some guest related information, memory partitioning, etc., is also required by the hypervisor at runtime. The tool should have the ability to extract this information (preferably in the form of macros and definitions)
Supervised dual guest use case
Let’s take a dummy master device tree and use the above steps to partition it for a supervised dual guest-OS configuration. It is assumed that the hardware platform is an ARM SoC, so some essential data from ARM bindings would appear in the master DTS. Guest OSes are assumed to be Linux, and the hypervisor is assumed to be standalone software running on bare metal.
Listing 1 shows the master device tree supporting two CPUs, each dependent on a base-clk node for its ticks. The master DTS for an ARM SoC has some essential nodes in the shape of timer and gic nodes. It also has only a couple of peripherals, namely a general purpose I/O (GPIO) and a universal asynchronous receiver/transmitter (UART). These peripherals depend on some pin multiplex (pin-mux) settings. All device nodes are labelled so that they can be referenced later on.
Listing 2 below shows two Linux OSes as guests, each to be run on one of the cores supported by the hardware platform. Guest 0 has access to UART while GPIO is only available to Guest 1. Both guests share a virtIO-based virtual console device supported by the hypervisor. 
The above two device tree structures are combined to a get single DTS representation for the tool to process. Figure 2 below shows a graphical form of the combined DTS. Nodes in blue come from the master DTS while the yellow nodes represent the data coming from partitioning annotation.
Figure 2: Device tree structure with annotated data
The next step is to mark up nodes in the combined device tree shown in Figure 2. This is done by traversing the dependency sub-trees formed by the label-reference combination. The device tree compiler assigns a unique ID to every labelled node in the device trees structure. This ID is stored in a property called ‘phandle’. When a labelled node is referenced, this phandle can be used to traverse to that node.
Figure 3 shows the dependency graphs for both guests in our use case. Yellow nodes highlight the nodes that would only be valid for guest 0, blue and orange nodes represent the shared nodes, which need to be retained in either guest’s device tree blob. The dotted edges indicates the nodes on either side are additional metadata for helping to traverse the dependency graphs, while the solid edges actually link the the device nodes that need to be retained.The dotted blue path circles the device nodes that would be marked up for inclusion in the generated device tree blob for guest 0.
Figure 3: Dependency graphs in annotated device tree structure
Figure 4 shows the resulting device tree structure generated by the partitioning tool. The yellow node represents a new node inserted by the partition tool to configure a virtual console device. The node has been moved from annotated data to the base of the new device tree.
Figure 4: Filtered platform information available to one of the guests
Eliminating tree complexity
Figure 3 above shows the dependency subtree of a guest is a directed acyclic graph, with edges indicating that phandle references and vertices are the device nodes. In order to mark all the dependent sub-nodes, one can use the Depth First Search (DFS)  algorithm, whose complexity is O(n+m) where n are the vertices and ‘m’ the edges connecting those vertices. For our case, this complexity is linear with the number of guest OSes in the system as DFS would be repeated for each guest. Filtering and inserting new node(s) adds a constant factor to the complexity.
Faheem Sheikh is a staff engineer in the embedded software division of Mentor Graphics working on embedded virtualization technology. He has many years of development experience with multicore high-performance computing systems. He has a PhD in computer engineering from Lahore University of Management Sciences.
1. G.Likely and J. Boyer A symphony of flavors: Using the device tree to describe embedded hardware
4. VirtIO specs