Editor’s Note: This article is an introduction to the Multicore Association (MCA) Multicore Programming Practices Guide developed by the members and available for download from the MCA web site. The Guide is a detailed set of best practices for employing an evolutionary approach to multicore development.
Multicore processors have been available for many years; however, their widespread use in commercial products and across multiple market segments has only recently occurred. This ‘multicore era’ transpired because processor architects ran into power & performance limits of single core processors.
The multicore era shifts more of the responsibility for performance gains onto the software developer who must direct how work is distributed amongst the cores. In the future, the number of cores integrated onto one processor is expected to increase, which will place even greater burden on the software developer.
Like many new eras in the computer industry, numerous supporting software development tools and technologies have been introduced with the aim of helping developers obtain maximum performance benefit with minimal effort. One of the potential challenges that may hinder developers is the conflict between an intrinsic quality of developing for multicore processors and the inertia of existing software.
Prior to the multicore processor era, developers obtained performance increases through processor upgrades and the inherent clock frequency boosts and microarchitecture improvements. These upgrades required minimal software changes.
Obtaining performance increases in the multicore era requires developers to invest in significant software modifications to, in effect, transform current sequential applications into parallel ones. This modification is nontrivial and introduces new challenges, spanning the traditional development phases of program analysis, design, implementation, debug, and performance tuning.
In stark contrast, the inertia of existing software is undeniable. To meet tight deadlines, development projects rarely have the freedom to make wholesale changes to applications, and instead limit risk by minimizing them.
This means that a developer considering a move to multicore processors may not have the luxury of taking on a new parallel programming language or even re-architecting the application to support widespread concurrency. Instead, many development projects in the computing industry are adopting an evolutionary approach to enabling multicore processors.
This approach employs existing programming tools and technology and a systematic method of introducing and debugging concurrency in existing software.
The MPP Guide
The Multicore Association (MCA) Multicore Programming Practices (MPP) Guide is a detailed set of best practices for employing this evolutionary approach to multicore development. The following sections detail the goal of the MPP guide, the target audience, how to apply the MPP guide, and areas not covered.
In writing this document, the MPP working group made decisions and compromises in what and how much this document targets. The following are thoughts on some of the higher level decisions made:
- API choices – Pthreads and MCAPI were selected for several reasons. These two APIs cover the two main categories of multicore programming – SMP and message passing. Second, Pthreads is a very commonly used API for SMP. MCAPI is a recently introduced API, but is novel in being a message passing-based multicore API that specifically targets embedded homogeneous and heterogenuous applications.
- Architecture categories – The architecture categories (homogeneous multicore with shared memory, heterogeneous multicore with mix of shared and non-shared memory, and homogeneous multicore with non-shared memory) are in priority order and reflect the group’s assessment of use in embedded applications. Admittedly, the second category is a superset of the first and third. The motivation for listing these three categories separately is due to the widespread and common use of techniques associated with the first and third, namely SMP techniques and process-level parallelism. The following paragraphs dissect this goal and offer more detail.
A concise set of best practices is a collection of best known methods for accomplishing the task of development for multicore processors.
The intent is to share development techniques that are known to work effectively for multicore processors thus resulting in reduced development costs through a shorter time-to-market and a more efficient development cycle for those employing these techniques. The phases of software development discussed in MPP and a summary of each follows:
- Program analysis and high level design is a study of an application to determine where to add concurrency and a strategy for modifying the application to support concurrency.
- Implementation and low level design is the selection of design patterns, algorithms, and data structures and subsequent software coding of the concurrency.
- Debug comprises implementation of the concurrency in a manner that minimizes latent concurrency issues, enabling of an application to be easily scrutinized for concurrency issues, and techniques for finding concurrency issues.
- Performance concerns improving turnaround time or throughput of the application by finding and addressing the effects of bottlenecks involving communication, synchronization, locks, load balancing, and data locality.
- Existing technology includes the programming models and multicore architectures detailed in the guide and are limited to a few that are in wide use today.
- Existing software, also known as legacy software, is the currently used application as represented by its software code. Customers using existing software have chosen to evolve the implementation for new product development instead of re-implementing the entire application to enable multicore processors.
Readers of the MPP guide who apply the detailed steps may save significant costs in development projects. Cost savings can be applied to developer efficiency in analyzing and designing the implementation, a reduced number of bugs in the implementation, and increased likelihood of the implementation meeting performance requirements. These savings lead to faster time-to-market and lower development cost.
The MPP guide is written specifically for engineers and engineering managers of companies considering or implementing a development project involving multicore processors and favoring the use of existing multicore technology. Specifically, the benefits of this guide to the specific target audience are summarized below:
- Software developers who are experienced in sequential programming will benefit from reading this guide as it will explain new concepts which are essential when developing software targeted at multicore processors.
- Engineering managers running technical teams of hardware and software engineers will benefit from reading this guide by becoming knowledgeable about development for multicore processors and the learning curve faced by their software developers.
- Project managers scheduling and delivering complex projects will benefit from reading this guide as it will enable them to appreciate, and appropriately schedule, projects targeting multicore processors.
- Test engineers developing tests to validate functionality and verify performance will benefit from reading this guide as it will enable them to write more appropriate and more efficient tests for multicore-related projects.
The MPP Guideis organized into chapters based upon software development phases.Following a brief introduction, each chapter includes either a processdescription or topic-by-topic coverage containing the technical detailsor both. The MPP guide provides more than a cursory review of a topicand strives to offer a succinct, but detailed enough descriptiontargeting the average developer.
Code examples in the MPP guideare based upon real implementations and provide instructions forreproducing with the required development tool. Figure 1 is a samplecode listing which depicts C code and instructions for compilation. Theupper box contains the sample C language code. The text following thecomment ‘/* Filename:’ in the source code is the file name which isreferred in the steps to reproduce. The file is example.c. The textfollowing ‘To reproduce:’ contains commands to enter on the command linewhich enables reproduction. The code examples in this guide follow thesame format.
Areas Outside the Scope of MPP
TheMPP guide does not address every possible mechanism for parallelprogramming – there are simply too many different technologies andmethods.
The guide constrains coverage to existing and commonlyused technology. Many of the techniques detailed in the guide can begenerally applied to other programming models, however, this guide doesnot comment on the specifics necessary in a re-mapping.
Chapter 7contains comments on many of the technologies and methods, but thediscussion is intended to be informational and not specific to the bestpractices documented in the previous chapters. In particular, the areasoutside the scope of the best practices portion of the MPP guide are:
- Languages other than C and C++ and extensions to C and C++ that are not part of the C and C++ ISO standard or are proprietary extensions to the standards;
- Coding style guidelines; and
- Architectures and programming models other than those specified in Chapter 2. This includes C & C++ compatible parallel libraries other than those specified in Chapter 2.
Multicore programming tools andmodels include items such as software development tools, programminglanguages, multicore programming APIs, and hardware architectures. Whilethe programming practices covered in this guide are exemplified onthese specific tools and models, our intent is for these practices to begenerally applicable.
This chapter specifies the programminglanguages, multicore programming APIs, and hardware architecturesassumed in the programming practices documented in the later chapters.
The MPP Guideemploys standard C and C++ as the implementation languages becausethese are the predominant languages used in embedded softwaredevelopment. The C language as specified in “ISO/IEC 9899:1999,Programming Language C” is used.
The C++ language as specifiedin “ISO/IEC 14882, Standard for the C++ Language” is used. For bothlanguages, we did not use any nonstandard extensions, either commercialor research oriented.
Implementing Parallelism: Programming Models/APIs
Thereare several types of parallelism used in multicore programming, namelyTask Level Parallelism (TLP), Data Level Parallelism (DLP), andInstruction Level Parallelism (ILP). The MPP guide focuses on TLP andDLP, however all three types of parallelism play a role in design andimplementation decisions.
The MPP Guide uses two multicoreprogramming APIs, Pthreadsi and Multicore Communications API (MCAPI),for its examples. This will allow us to provide coverage of twocategories of multicore software development – shared memory programmingand message-passing programming. We chose these two APIs because theyare commonly used APIs within these two software development paradigms.
TheMPP Guide targets three classes of multicore architectures: homogeneousmulticore with shared memory, heterogeneous multicore with mix ofshared and non-shared memory, and homogeneous multicore with non-sharedmemory.
These three classes are the predominant architecturesemployed in embedded projects and they are different enough so that theset of practices that apply to each are substantially disjointed. Theunderlying communication fabric is outside the scope of this documentand in most cases is irrelevant to the software developer.
Ahomogeneous multicore processor with shared memory is a processorcomprised of multiple processor cores (2 to n cores) implementing thesame ISA and having access to the same main memory.
Aheterogeneous multicore processor with a mix of shared and non-sharedmemory is a processor comprised of multiple processor cores implementingdifferent ISAs and having access to both main memory shared betweenprocessor cores and local to them. In this type of architecture, localmemory refers to physical storage that is only accessible from a subsetof a system.
An example is the memory associated with each SPEin the Cell BE processor from Sony/Toshiba/IBM. It is generallynecessary to state what portion of the system the memory is local to, asmemory may be local to a processor core, a processor, an SoC, anaccelerator, or a board.
A homogeneous multicore processor withnon-shared memory is a processor comprised of multiple processor coresimplementing the same ISA and having access to local, non-shared mainmemory.
The software tools employed in this guide and details of use are summarized as follows:
- GNU gcc version 4.X – compiler used for all C examples
- GNU g++ version 4.X – compiler used for all C++ examples
POSIXThreads as specified in “The Open Group Base Specifications Issue 6,IEEE Std 1003.1” are used with no nonstandard extensions, eithercommercial or research oriented. No nonstandard extensions eithercommercial or research oriented are used. Some homogeneous multicorearchitectures employ cores with a mix of shared & non-shared localmemory, however this document does not focus on these.
The full version of the Guide to Multicore Programming Practices is available for download
from the Multicore Association.
Max Domeika is a tools architect at Intel Corporation, creating tools targeting theIntel Architecture market. Over the past 15 years, Max has held severalpositions in product development. Max earned a BS in Computer Sciencefrom the University of Puget Sound, an MS in Computer Science fromClemson University, and a MS in Management in Science & Technologyfrom Oregon Graduate Institute. Max is the author of “SoftwareDevelopment for Embedded Multi-core Systems” from Elsevier and “BreakAway with Intel Atom Processors” from Intel Press. In 2008, Max wasawarded an Intel Achievement Award for the BEC technology.
David Stewart is the CEO and co-founder of CriticalBlue, a company that develops anddistributes software solutions and associated services for migratingexisting software applications onto multicore platforms. He is alsoco-chair of the Multicore Programming Practices (MPP) working groupwithin the Multicore Association. David has over 25 years experience inthe embedded software, EDA and semiconductor industries. This includes10 years at Cadence where he was a founder and Business DevelopmentDirector of the System-on-Chip (SoC) Design facility at the Alba Campusin Scotland. This initiative attracted worldwide interest and the DesignCentre grew to 200+ people in its first 18 months. Before Cadence hewas a chip designer, with spells at LSI Logic, NEC Electronics andNational Semiconductor. David has worked for several startups, served onthe board of several other small technology companies and been anadvisor to a venture capital firm.
Rob Oshana is aDistinguished Member of Technical Staff and Director of Global SoftwareR&D for Digital Networking at Freescale Semiconductor. He is alsoan adjunct professor at Southern Methodist University where he teachesgraduate software engineering and embedded systems courses. Rob has over30 years of experience as a software leader with extensive experiencein embedded systems, software engineering, software quality and process,and leading global development teams. He is author of several books andis a recognized international speaker.
Other Contributors: In addition to co-chairs, primary contributors to the MulticoreProgramming Practices Guide include: Hyunki Baik (Samsung), FrançoisBodin (CAPS entreprise), Ross Dickson (Virtutech/Wind River), Scott A.Hissam (Carnegie Mellon University), Skip Hovsmith (CriticalBlue), JamesIvers (Carnegie Mellon University), Ian Lintault (nCore Design), andStephen Olsen (Mentor Graphics).
Markus Levy ispresident of The Multicore Association and chairman of the MulticoreDeveloper's Conference. He is also the founder and president of EEMBC.Mr. Levy was previously a senior analyst at In-Stat/MDR and an editor atEDN magazine, focusing in both roles on processors for the embeddedindustry. Levy began his career in the semiconductor industry at IntelCorporation, where he served as both a senior applications engineer andcustomer training specialist for Intel's microprocessor and flash memoryproducts. He is the co-author of Designing with Flash Memory, the oneand only technical book on this subject, and received several patentswhile at Intel for his ideas related to flash memory architecture andusage as a disk drive alternative.