Booting an RTOS on symmetric multiprocessors -

Booting an RTOS on symmetric multiprocessors

Although the use of multiple processors in desktop computation has become commonplace, such a configuration is still finding its place in deeply embedded devices in such markets as consumer electronics, aviation, and automotive. An embedded system has unique characteristics and often requires real-time behavior to complete at least a portion of its job. Significant research is underway in the industry and in academia to design tools that can help today's designers of embedded systems software (operating system and applications) benefit from the newly found computational power waiting to be tapped in the form of multiple cores. It is a significant challenge to provide a framework to deploy inherently parallel applications like digital signal processing functions on RISC processors to reduce the bill of materials. To this end, the embedded operating system is an essential piece of the overall solution, which in a perfect world, should be able to dispatch individual application tasks on multiple cores.

Transitioning from a single-core kernel to a multicore version is not a straightforward proposition as it involves introducing new features like protection primitives, inter-process communication (IPC) provisions, as well as enhancing old components like a task scheduler. This article focuses on one important feature–the startup sequence of traditional single-core kernel–which needs to be updated right now if multicore architectures are to be supported. This article discusses the process once an embedded system boots up; some of the problems/challenges of designing a startup sequence on symmetric multiprocessing (SMP) systems; and finally, a scalable and portable booting sequence is presented for embedded SMP systems.

Boot monitor for single-core embedded systems
An excellent reference to boot monitors in embedded systems is U-Boot whose source code is available from In commercially available boot monitors, there is always some part of the initialization that is done in low-level (assembly) language, which normally contains the entry point of software, exception vectors, and initial stack setup. The amount of work done before jumping to the C environment varies depending on the architecture and version of the board. This is one of the primary reasons commercial real-time operating system (RTOS) vendors place extra care when designing their boot sequence–to reduce subsequent porting efforts. This is usually done to maximize the initialization code to reside in C, as well as designing the software components in layers to support proper abstraction from the user. Figure 1 depicts the operations performed by the initialization sequence of an embedded RTOS; the order of these operations can change between implementations. Compared with commercial boot monitors, the embedded RTOS designer provides a mixed-C assembly code with proper layers of abstraction.

Click on image to enlarge.

SMP boot sequence: challenges abound
In order to enhance the single-core boot sequence to support a SMP system, fundamental issues such as the order of memory/interrupt initialization and synchronization among cores and stack setup have to be solved by the embedded operating system. The following are a few of the challenges associated with SMP boot sequence; possible solutions are presented based on trade-offs in performance.

  1. Order of initialization: In an SMP system, a new component has to be initialized that was not present in single-core systems. This module is the cache coherency unit. All cores taking part in SMP operations need to initialize this unit in order to keep their respective cache up to date with latest values of memory variables. The initialization can be done at the start in low language code, but this will result in a specific implementation. A better approach is to make it part of the memory initialization component. It's important to note that before the initialization of the cache coherence unit, if the embedded operating system relies on global across the cores variables, it will not work.

    Another factor that has an impact on the design of multicore boot sequence is the topology of the system and availability of its information. For instance, how many cores are present at boot time? Which core is the primary core? Can another core be added at run time to the SMP system? The answer to these questions not only has an impact on the order of initialization of different components but can have an indirect impact on the stack setup and synchronization issues.

  2. Stack setup: Typically, in a single-core system, a startup stack is initialized to jump into the C environment. Later, the system stack is initialized and the whole system uses that stack. When supporting a multicore system, one approach might be to have each core repeat the temporary and system stack operations. However, a better approach is to setup one primary core as the system stack for all of the cores. This will reduce the initialization code on the secondary cores.
  3. Synchronization among cores: During the booting process, the SMP cores need to be synchronized. First of all, at reset time, if the primary core starts executing, it has to make sure that all the other cores remain in the wait state. However, since no other component of the systems is initialized, the problem is how to make secondary cores wait. If the secondary cores execute a Wait-For-Event instruction, they will have to rely on the primary core to setup their interrupts component, which may not always be possible because of hardware limitations. Also, the secondary cores cannot use a memory flag upfront since coherency unit and memory are not initialized at that point in time.

Another synchronization point, from the perspective of primary core, is the instance when it has initialized the kernel data structures. At this point, the primary core has to ensure that all the other cores will execute forward once it's done with kernel initialization. Since, up until this time, memory components are initialized, it makes sense to use spinlocks with shared memory flags to handle this synchronization barrier. But in doing so, you now have the problem of robust management of these synchronization needs during the boot process.

Proposed multicore boot sequence
In order to tackle the issues, we propose a multicore boot sequence for a real-time embedded operating system. This proposal assumes the SMP topology is known beforehand at compile time. It also assumes the availability (most SMP platforms support this) of a systemwide global register with a known reset value to synchronize cores right at the reset time.

Figure 2 shows a flow chart of the proposed initialization sequence for both primary and secondary cores. Notice that there is little difference in operation and logical flow between the primary and secondary cores. In the proposed scheme, the primary core starts executing at reset and goes on to set up the initial stack.

Click on image to enlarge.

Afterwards, the purple background boxes represent the operations performed by all the cores while the orange background boxes indicate operations only performed by the primary core. The bottom part of the figure shows that secondary cores take part in the boot sequence, but have to perform fewer operations.

In summary, the highlights of the proposed boot sequence include:

  • No reliance on scripts: The proposed boot sequence does not rely on a separate startup script. All the necessary initialization is handled within the boot sequence component of the embedded operating system. This is beneficial for maintenance and porting purposes.
  • Cache coherency: Setup is made part of memory initialization to provide a proper layer of abstraction to the operating system. Any configuration steps required by a core to register its participation in the cache coherency protocol are also handled here.
  • Two synchronization points: The solution we describe here is aimed at distributing the boot code as much as possible. Therefore, two synchronization points (where secondary cores will wait) have been proposed:
    • An early synchronization point: This is hardware-dependent. Secondary cores will spin on a register with known reset value; the primary core will write to this register once it has completed memory initialization.
    • A later synchronization point: Just before passing the control to the scheduler, the primary core has to make sure that all the kernel data structures have been initialized; similarly the secondary cores should also be allowed to proceed to main code once these structures have been initialized by the primary core. As already mentioned, since this point is at the end of the boot sequence, synchronization can be achieved by using global memory variables and spinlocks.

Real-world implementation and testing
In the proposed boot sequence depicted in Figure 2, only the primary core sets up the startup stack, while all the other cores switch immediately to their system stack at reset.

We have implemented this type of boot sequence within Nucleus SMP and it has been tested on multicore chips such as ARM's Cortex A9 MPCore with a 4-core configuration.

Irfan Ahmad is a technical lead in Embedded Systems Division at Mentor Graphics . He has 6 years of embedded development experience spanning RTOS kernels, middleware products and device drivers on a wide range of hardware platforms. He holds bachelors degree in electrical engineering from University of Engineering and Technology (UET) Lahore

Faheem Sheikh
is the senior technical lead (systems) for the Mentor Embedded Division of Mentor Graphics. He joined the company in 2007, where his current focus is software research and development for symmetric multiprocessor architectures. Faheem has a masters and a PhD degree in computer engineering from Lahore University of Management Sciences, Pakistan. He has more than 10 technical publications in leading international conferences and journals.

Ville-Veikko Helppi is a product marketing manager for the Mentor Embedded Division of Mentor Graphics. He is responsible for simulation/prototyping and the embedded compiler product lines. Ville-Veikko has 10 years of experience in the embedded software industry with a master of science in electrical engineering (embedded systems) and master of science in economics and business administration from the University of Oulu in Finland.

Dan Driscoll is the software architect for the Mentor Embedded Nucleus Operating System and Middleware products. He has worked in the Embedded Software Division of Mentor Graphics for nearly 10 years in a variety of roles. His background has been focused heavily in kernel and BSP development across numerous embedded architectures. Dan holds a BS degree in computer science from the United States Military Academy at West Point.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.