A portable multicore runtime library based on embedded industry standards - Embedded.com

A portable multicore runtime library based on embedded industry standards


Multicore embedded systems are widely used in telecommunication systems, robotics, automotive vision systems, medical applications, life critical systems and more. Today, they usually consist of homogeneous/heterogeneous cores operating on different ISAs, operating systems, and dedicated memory systems in order to provide high throughput, low latency, and energy-efficient solutions.

In spite of the great potential of multicore embedded systems, the lack of software development tools and standards has created a barrier to their full adoption. Programmers are required to write low-level code, schedule work units, and manage synchronization between cores if they are to reap significant benefits from these systems. As system complexity increases, it is not practical to expect programmers to handle all the low-level details in order to exploit the platform's concurrency.

Handling these manually is not only time consuming, but an error-prone and laborious task. Even worse, software portability is almost non-existent. The state of the art is that hardware vendors supply vendor-specific development tool chains tied to the details of the device they were originally designed for; this may preclude use of the software on any future device even from the same family. Some of the existing approaches that address this issue include debugging language extensions or parallel programming libraries, but there are no well-accepted joint standards in the embedded systems domain.

To address this issue, a group of vendors and software companies formed the Multicore Association (MCA). The main aim of this association is to reduce the complexity involved in writing software for multicore chips. MCA has put together a cohesive set of APIs to standardize communication (MCAPI), resource management (MRAPI), and virtualization spanning cores on different chips. Since MCA APIs are vendor-independent application-layer specifications, they do not require architectural or OS support, hence they enable system developers to write portable program codes that will scale through different architectures.

However, the MCA APIs are low-level, library-based protocols that could make programming still tedious. Programmers still have to explore the features of MCA APIs and largely restructure the code using MCA APIs. As a result, a high-level programming model is needed that could help to express concurrency in a given application easily, while providing sufficient features capable of capturing the low-level details of the underlying systems. At the University of Houston, our research group, in collaboration with Freescale Semiconductor Inc., has investigated the use of a compiler-runtime approach that makes use of a high-level programming standard, the OpenMP model, in which the underlying OpenMP runtime library exploits the capabilities of MCA APIs to hide low-level details of the platform.

OpenMP is a de facto standard for shared-memory parallel programming. It provides a set of high-level directives, runtime library routines, and environment variables that enable programmers to easily express data and task parallelism in an incremental programming style. The compiler handles the low-level details of thread management, loop scheduling, and synchronization primitives. OpenMP code is portable across a number of compilers and architectures, which allows the programmer to focus on the application instead of low-level details of the platform. Embedded programmers could therefore benefit from such software portability and programmability.

In this paper, we propose a new portable OpenMP runtime library for multicore embedded systems, libEOMP, where the 'E' stands for Embedded. It exploits the capabilities of the MCA APIs to fill the gap between the existing runtime implementations for general-purpose architecture and the new challenges for multicore embedded systems, and allows support for an implementation of the OpenMP standard. Our effort includes selecting appropriate characteristics of the MCA APIs, determining the translation strategy, as well as delivering the overall runtime design and implementation for high-performance mapping.

Figure 1 gives an overview of our Embedded OpenMP solution stack. We evaluated libEOMP on an embedded platform from Freescale Semiconductor Inc. Results from embedded benchmarks show that libEOMP not only performs as well as vendor-specific approaches but also promises portability, programmability, and productivity.

Figure 1: Overview of the Embedded OpenMP solution

Using libEOMP, the programmer gets enough control to write efficient code, especially when the hardware details are abstracted from the programmer. The prototype implementation demonstrates that OpenMP could be used with MCA APIs to execute applications on embedded systems without the need to be aware of the low-level details of the system. Evaluation results of libEOMP using several benchmarks have demonstrated that libEOMP performs competitively with optimized vendor-specific approaches but also offers portability and productivity.

Currently we have implemented the libEOMP only on a homogeneous platform. We have not been able to consider evaluating our novel approach on a heterogeneous platform since a prototype implementation of MRAPI covering features of heterogeneous systems does not yet exist, to the best of our knowledge.

As part of the future work in an on-going collaboration with Freescale, once the prototype implementation of MRAPI for heterogeneous platform is made available we plan to use it to target a variety of underlying devices. We will also be considering the newest version of OpenMP that will be providing support for heterogeneous systems. The current version of OpenMP 3.1 does not address the concept of heterogeneity or accelerators.

The MCA APIs provide a standard interface for programmers that is independent of any operating systems and devices, thus allowing it to be portable across various possible architectures. Standards take a long time to develop and establish. There are better chances of more and more programmers embracing such standards if OS, processor, and tool vendors provide MCA API as part of their implementation.

A longer more detailed version of this paper and on-going research – Industry standards for programming multicore systems: Way to Go! (ME1076) – is to be presented by Sunita Chandrasekaran at the 9th Annual Multicore Developers Conference , (May 7-8 ) in Santa Clara, Ca.

Sunita Chandrasekaran is a Postdoctoral Fellow at the High Performance Computing and Tools (HPCTools) research group at the University of Houston, Texas, USA. Her current area of work spans HPC, programming models for heterogeneous and multicore embedded technology solutions. She has a Ph.D. in Computer Science Engineering from Nanyang Technological University (NTU), Singapore, and a B.E in Electrical & Electronics from Anna University, India.

Cheng Wang is currently a PhD student at the High Performance Computing and Tools (HPCTools) research group at the University of Houston, Texas, USA. His research interests are high performance computing, compilers, parallel programming models and multicore embedded systems. Cheng Wang received his B.S. degree from Xidian University in 2010.

Barbara Chapman is a professor of Computer Science at the University of Houston, where she teaches and performs research on a range of HPC-related themes. Her research group has developed OpenUH, an open source reference compiler for OpenMP with Fortran, C and C++ that also supports Co-Array Fortran (CAF) and CUDA. In 2001, she founded cOMPunity, a not-for-profit organization that enables research participation in the development and maintenance of the OpenMP industry standard for shared memory parallel programming. She is also a member of MCA and OpenACC standard organizations. Her group also works with colleagues in the U.S. DoD and the U.S. DoE to help define and promote the OpenSHMEM programming interface. She has conducted research on parallel programming languages and compiler technology for more than 15 years, and has written two books, published numerous papers, and edited volumes on related topics. Her degrees include a B.Sc. from Canterbury University, New Zealand, and a Ph.D. from Queen’s University, Belfast.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.