Multicore programming made easy?
The first multicore platforms have found their way into embedded systems for entertainment and communication, especially thanks to their greater computational power, flexibility, and energy efficiency. However, as we will show, mapping applications onto these systems remains a challenge that is costly, slow, and prone to errors.
Although the multicore programmable architectures have a huge potential to tackle present and future applications, a key issue is still open: how can developers map an application onto such a multicore platform fast and efficiently, while profiting from the potential benefits of parallel processing?
This question can be reformulated as: what programming model should they use? (In a broad sense, a programming model is a set of software technologies and abstractions that provides the designer with means to express the algorithm in a way that matches the target architecture. These software technologies exist at different levels of abstraction and encompass programming languages, libraries, compilers, run-time mapping components, and so forth.)
Obviously, programming a multicore system requires a sort of parallel programming model. A number of parallel programming languages (OCCAM, HPF, MapReduce), libraries (pThreads, MPI, OpenMP), and other software solutions (auto-parallelizing compilers, e.g. SUIF, Paraphrase-II, Paradigm, Compaan) have been researched in the last decades, yet their main problem has been either their low acceptance among developers or their poor performance.
How are multicore systems programmed today? Embedded software developers still prefer C/C++ programming. If absolutely necessary, they augment the sequential programs with a parallel library and/or optimized parallel components to achieve the required performance. This seems to work for now, but as the number of cores grows beyond (let us say) 10, the sequential programming model will likely become a considerable burden. Being embedded software developers, we often dream of an auto-parallelizing compiler that analyzes sequential C/C++ and/or Java programs, takes into account the target architecture, and, given the application performance requirements, maps the program onto the multicore platform. Unfortunately, such a super-compiler doesn't exist and chances are rather low that it ever will, especially due to the vastness and complexity of the design space.
How could/should multicores be programmed in the future? Dividing the parallel programming challenge into a number of sub-challenges is definitely part of the answer. Current solutions, discussed here in more detail, rely on providing the developers with solid software-tool support (OpenMP, MPA). In the many-core future, the true parallel programming (and thinking) will still be hard, so the developers will most probably like to keep thinking sequentially, while creating highly parallel programs (Maestro/Axum).