Getting started with multicore programming: Part 1
Multicore platforms are becoming prevalent, and someone needs to program them. Initial multicore experiences for most embedded programmers will be with coherent shared memory systems. Compared to single core systems, these shared memory systems are much more challenging to program correctly.Nevertheless, with an incremental development and test approach to parallelism and a willingness to apply lessons learned by previous parallel programmers, successful systems are being deployed today using existing C/C++ environments.
It's too Hot
The free lunch is over for programmers. [1] Though Moore's law
marches on, and the
number of economically manufacturable transistors per chip continues
increasing, clock frequencies have hit a wall because of power
dissipation. It's gotten too hot to handle.
Instead of increasing the clock frequency, designers can use larger transistor budgets to do more work per clock cycle. Within a single processor pipeline, techniques such as instruction-level parallelism, hardware threads, and data-parallel (SIMD) instructions have reached the point of diminishing returns.
It now makes more hardware sense to add multiple processor cores on chip and turn to task level parallelism. It's left to software engineers to properly exploit these multicore architectures.
Multicore systems (Figure 1, below) are typically characterized by number and type of cores, memory organization, and interconnection network. From a programming model perspective, it is useful to consider the memory architecture first.
![]() |
| Figure 1: Multicore Architectures |
Memory architectures can be broadly classified as shared or distributed. In a typical shared memory all cores uniformly share the same memory. Cores share information by accessing the same memory locations.
<>Lightweight threads, defined as multiple instruction streams sharing the same memory space, are a natural abstraction for a shared memory programming model. The programming model is familiar to multithreading programmers of single core systems. Vendors in both desktop/server and embedded markets offer coherent shared memory systems, so there are a growing number of shared memory platforms available to programmers. >In a typical distributed memory system, memory units are closely coupled to their cores. Each core manages its own memory, and cores communicate information by sending and receiving data between them. Processes running on different cores and sharing data through message passing, are a common abstraction for a distributed memory programming model.
In shared memory systems, data communication is implicit; data is shared between threads simply by accessing the same memory location. If the cores use cache memories, their view of main memory must be kept coherent between them.
As the number of cores increases, the cost of maintaining coherence between caches rises quickly, so it is unlikely this architecture will scale effectively to hundreds of cores.
However, with distributed memory architectures, the hardware design scales relatively easily. Since memory is not shared, the programmer must explicitly describe inter-core communication, and interconnection network performance becomes important.
Driven by the advantages of matching multiple execution pipelines to shared memory, it's probable that a hybrid on-chip architecture (Figure 2 below) will emerge as the number of cores per chip increases. This architecture is already in use at the board level to connect clusters of shared memory chips.
![]() |
| Figure 2: Hybrid Distributed Shared Memory Architecture. |
It is likely that most programmers' initial multicore experience will involve some type of shared memory platform. Though the programming model appears straightforward, these systems are notoriously difficult to program correctly.




Loading comments... Write a comment