Manage multiple processes and processors in a deterministic multicore design - Embedded.com

Manage multiple processes and processors in a deterministic multicore design

How should processes running on different RTOSes communicate in systems with more than one operating system? The author suggests you can manage inter-process communications with global-object networking.

Click here for more content from ESD March 2012.

The typical embedded control system consists of a number of independent dedicated functions that interact with each other to perform a task. Functions may include control elements, data acquisition elements, and sometimes a human/machine interface, all working together. In many cases, these functions have been implemented as separate compute platforms interconnected via a communication network.

As the computing capabilities of off-the-shelf PCs have multiplied, original equipment manufacturers (OEMs) now can consolidate multiple functions onto a single platform to improve the reliability and efficiency of their systems, at the same time driving out dedicated hardware in favor of software-based implementations.

However, there are potential problems with consolidating the functions that were previously performed by separate computing platforms, and these problems can cause the system to fail if not dealt with successfully. Integrated systems mzust not remove the advantages that multiplatform systems accomplished by separating the functions.

One advantage of keeping system elements separate is determinism. Embedded systems need to respond to external stimulus in a reliable and predictable manner. Therefore, the means of communication between the application elements must be efficient, and the receiving elements must be able to rely on the data being timely and accurate.

Implementing each function as a process running under one real-time operating system (RTOS) might not be the best method. Scheduler interference between the functions results in lack of independence, which can cause processes that were previously executed with highest priority in a separate computing platform system to fail in a consolidated environment.

Virtualized operating systems have been used to achieve system consolidation, but when applied as an afterthought these methods have had limited success in supporting real-time determinism and scalability of the available processing power. Although some solutions yield successful results, they generally require customers to make compromises that limit the capabilities of their systems. Ultimately, determinism and scalability must be designed into the core of the software from its inception.Establishing requirements
An idealized control system may be described by a series of independent functional blocks (FBs) , each of which perform a specific task at a specific regular interval (Figure 1 ). FBs are made up of application-specific hardware and software components and a processor or controller to implement the function. Each FB may have a dedicated connection to external hardware such as motors, cameras, or other I/O points. Each FB may also be connected to one or more of the other FBs to implement a whole control system, but otherwise, each FB operates independently of other FBs.

An example of such a system (see Figure 1) could be where FB2 is a PLC (programmable logic controller) and FB3 an intelligent self-contained optical analyzer that captures the image of an object it's inspecting, analyzes it, and informs the PLC of the result with a GO/NO-GO output signal. FB1 could be a motor drive that feeds the next object that needs to be inspected.

With the availability of increased computing performance, it's possible to move some or all of the functions onto a single hardware platform. The advantage of this is a more efficient interconnect between the functions, which leads to better overall system performance and reliability and possibly reduced cost. But consolidating the functions means the development team needs to add system management, because the system is now apportioning common resources such as memory and access to I/O between the FBs.

Single-processor solutions
A typical way to combine functions on a single platform such as a PC is to implement the function blocks as separate processes running under an RTOS. RTOSes have built-in functions–such as preemptive task scheduling–to ensure that each FB's task is executed at a prescribed time and that a task from one FB can preempt the processing of another FB's task if it has a higher priority. A drawback of this implementation is that the platform resources are not separated. Therefore, events such as unusual I/O activity needing to be handled by one of the FBs could affect the other FBs' ability to perform their tasks.

The result is that setting up a system in this environment takes some tuning to ensure that tasks of the various FBs have the appropriate level of priority and resources. This typically makes the system somewhat custom-fit to the platform and thus making it difficult to move from one platform to another. It's also difficult to entertain any subsequent changes to the system, such as the addition of another FB or a performance improvement in one of the FBs, without extensive planning. Finally, the finite processing capabilities of a single processor generally limit the increase in system performance that can be obtained.

On the plus side, single processor implementations are relatively simple. They're typically run by one RTOS that has built-in IPC (inter-process communication) facilities, making it easy for the FBs to communicate with each other with minimal overhead (Figure 2 ).

Multicore solutions
Single system manager: The introduction of multicore processors made a big impact in the computing industry as a whole. However, the adoption of multicores in control-system applications has been a little slow for two reasons. The first and main reason is that OEMs have not been sure how to migrate applications that were optimized to run on a single processor onto a multicore processor platform without breaking them. And second, the industry has not produced a clear path to follow for migrating applications from single processor systems to multicores that enables OEMs to get the most out of multicore processors without requiring significant changes to the software.

In an effort to provide the easiest solution, multicore system managers were introduced (Figure 3 ). These are essentially operating systems with load schedulers that make decisions on how to load the FBs (applications) on the processor cores. Generally, the method used is SMP (symmetrical multiprocessing). Although this processor loading method has proved acceptable for non-real-time applications, it falls short of the requirements for real-time control systems. FBs scheduled under SMP aren't guaranteed to be totally unaffected by the operation of other FBs, and the system's I/Os are not partitioned and dedicated to the associated FBs.

Multi-system manager: A better approach is to go back to the fundamentals and use embedded virtualization techniques to run each FB on a separate core of a multicore platform, with allocated/ dedicated I/Os and memory management (Figure 4 ). With each core under the control of an individual system manager (RTOS), the system is essentially an AMP (asynchronous multi-processing) system.


Click on image to enlarge.

The missing bit in this implementation is ability for the FBs to communicate with each other. This can be added to the application layer or embedded within each system manager (RTOS). The former is easier to implement but difficult to make transparent to the developer. It's even harder to integrate with the application's priority level so that any inter-FB communication is managed as part of the prioritization of other tasks that are running.

A better method is to integrate the IPC mechanism into the RTOS (Figure 5 ) where it may be automatically handled by the RTOS priority scheduler. This enables programmers to refer to standard exchange objects–such as mailboxes, semaphores, and shared memory blocks–within their code to perform IPC between processor cores and across platforms. This embedded IPC methodology is so transparent that it allows FBs to be separated and distributed among available processors with minimal code changes.


Click on image to enlarge.

For this system to work reliably there must be a means for the FBs to know the state of other FBs on which they depend. This is done by adding a system-manager layer to the environment. This layer provides a means of signaling system events, such as the creation and deletion of interdependent processes. Both the IPC and the management layers are designed to be extended across multiple instances of the RTOS, thus making it easy to scale a system from one to many processors, be they on the same platform or even multiple platforms.

Implementing global objects
Global objects provide a means for implementing the above and have the following characteristics:

  • Each instance of the RTOS is launched on each core of a multicore processor. In this particular RTOS, system functions are implemented as objects, each referred to by a handle. Each instance of the RTOS operates independently of the others, with its own separate memory, scheduler, I/O devices, and associated interrupt management. Processes that run on one instance of the RTOS have no impact on processes that run on another instance of the RTOS.
  • An efficient, asynchronous message-passing system provides the common transport to support the IPC and management layers across multiple instances of the RTOS.
  • The system APIs are extended transparently to allow a process on one instance of the RTOS to access objects on another instance of the RTOS. This is the same method by which processes communicate within a single instance of the RTOS, but the scope of the object handles has been extended by means of the IPC layer so that a handle may refer to an object elsewhere. Each instance of the RTOS has a global object manager process to manage this. This management process acts as an agent for remote processes interacting with local objects.
  • The system-management layer (called DSM, or Distributed System Manager, in the example shown in Figure 6 ) allows processes to register their dependence on other processes in order that they're notified if another process in the relationship is terminated or compromised. The DSM layer also monitors nodes (instances of the RTOS), so that if one is shutdown or fails, processes on other nodes are notified.
  • In this example, which implements the example system described earlier, an instance of the Windows OS has been added and the functionality is also extended to that operating system.


Click on image to enlarge.

Referring to the application in Figure 1, we can now have the video camera application (which we'll call FB3) run by Process C on Core 3 and the PLC (FB2) run by Process B on Core 2 of a multicore platform. As before, the video application needs to inform the PLC when it has completed its video capture and analysis and tell the PLC whether the results are successful or not. If both applications were run on a single processor, passing such information would be easy to do using a mailbox transport object.

With global objects and the associated global object network, the same mechanism can be used with the addition of three program steps. First, the PLC (Process B) needs to make the Mailbox “global” so that processes running on other nodes/processors can discover it. Then two additional instructions must be executed to allow a remote process, such as Process C, to find a global object, in this case the Mailbox, so that it can write to it. Table 1 shows a listing of the commands that would be executed by both processes, with the additional instruction shown in bold and underlined.


Click on image to enlarge.

Note also that the DSM registers the connection between the processes when the Mailbox is established. So if the Process B that is the PLC application faulted, the video app, Process C, would be notified and would take appropriate action.

The concept of passing information from one independent functional block to another is extendable with memory objects. The steps to setup a memory object are similar to those of setting up a Mailbox. Global memory objects are better suited for the transfer large blocks of information. An example would be passing position-correcting information directly from the video application to the motor drive upon partial failure of results from the video capture and analysis.

Because memory objects are often subscribed to by more than two processes, it's necessary to allow a memory object to remain present even though the process that created it might have terminated and has no longer any need for it to be present. Global objects provide such a mechanism with the inclusion of an Index. For example, if a memory object is expected to be subscribed to by four processes, the process that creates the memory object assigns it an initial Index =4. Every time an object unsubscribes from the memory object the index is decremented. Only when the last process unsubscribes does the memory object get removed.

Global objects extend to a Windows environment as well. They can be used to allow applications that require an advanced Windows-based human interface to communicate information with RTOS-based functional blocks. In our example system (Figure 1), a Windows application could read video information directly from the memory object that is put there by the video application and display it for visual inspection by an operator.

The above example shows how global objects can be used by independent FBs to communicate information and how the ability to pass large blocks of information can even extend the system capabilities. Moreover, because they provide a mechanism for enabling distributing FBs to communicate, the use of global-objects-based communications can also be used to “break-up” an FB like a motor-drive across two processors if needed.

Implementation issues
When a process on one RTOS is interacting with an object on another RTOS, the behavior of both instances of the RTOS must continue to be predictable, and the overhead of performing the operation must be negligible. More importantly, functional integrity of the system must be preserved. Task priorities must remain intact, and the implementation must ensure that false priority and priority inversion problems, such as a low-priority thread on one node blocking a higher-priority thread on another node, are avoided.

Overheads of the implementation are kept to a minimum and predictability is maximized with the implementation of a light-weight message-passing layer based on a simple shared memory plus interprocessor interrupt for the transport. Operations are handled by a local agent on the receiving node on behalf of the sending thread, and are divided into two classes. The first class includes creation, deletion, and signaling of objects, which are performed by the agent directly. The second class of operations are those that wait on an object. The latter operations are handled by proxy threads that execute at the priority of the original caller, so that the calling thread is effectively woken up according to the scheduling policy of the RTOS where the object is resident.

As an example, consider two processes, each of which is implementing one of the FBs running on a single instance of the RTOS. Process A contains an object that's being waited on by a thread in Process B (Figure 7 ). When the object is signaled, the thread in Process B becomes ready to run when it's the highest-priority thread on the scheduler's ready list.

In Figure 8 , Process A and Process B are on separate instances of the RTOS. When the object is signaled, the proxy thread waiting on it is made ready. When it's the highest-priority ready thread on the RTOS instance, it runs and completes the transaction on behalf of the thread in Process B, which is now made ready.


Click on image to enlarge.

The effect of this is that the thread in Process B is subject to the scheduler state of RTOS 1 when it's waiting on objects that reside on that instance of the RTOS. This ensures that there aren't any arbitrary priority issues when the processes are interacting across separate instances of the RTOS.

Performance considerations
For processes that have tight loops, it's necessary for operations to carry negligible to no overhead. One way of achieving that is to use a memory object. A memory object may be mapped into many processes on different nodes, just as it may be mapped into many processes on the same node. Access to the memory carries no additional overhead once the mapping is complete, and controlling the access to the memory can be managed, for example, by some synchronization object such as a semaphore accessed by all processes, if the specific application requires it. This ensures that processes across the nodes access data at a specific time to avoid any contention issues and provide data synchronicity across the nodes/functional blocks.

Chris Main is CTO and cofounder of TenAsys Corporation, where he has led development of the company's virtualization technologies. He has worked in real-time systems starting with minicomputers and was involved in the iRMS group at Intel. He was on the original development team for INtime. He earned a graduate degree in physics from York University and postgraduate degree in education from Bath University.

This content is provided courtesy of Embedded.com and Embedded Systems Design magazine.
See more content from Embedded Systems Design and Embedded Systems Programming magazines in the magazine archive.
This material was first printed in March 2012 Embedded Systems Design magazine.
Sign up for subscriptions and newsletters.
Copyright © 2012
UBM–All rights reserved.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.