Getting started with multicore programming: Part 1
Why so Hard?
For a fixed number of processors, Amdahl's
law states that the maximum
speedup is limited by the percentage of a program which cannot be
parallelized:
![]()
Where N is the number of processors and Ts represents the percentage of work which cannot be parallelized.
Consider a system with N = 4 and Ts = 20%. The maximum achievable speedup is 2.5x. Even if the number of processors goes to infinity, the maximum speedup is only 5x, limited entirely by the amount of work which must remain serialized.
So there is significant pressure to maximize the proportion of activity running in parallel. This may involve significant rework of existing code into parallel form or switching to alternative, less familiar algorithms which offer more inherent parallelism.
Expressing parallelism requires using new and unfamiliar APIs, leading to programmers introducing bugs while mastering the interfaces. More significantly, parallelism means multiple execution flows.
In a shared memory model, any execution flow can access any memory location, so there are many possible orderings of memory access. It's left to the programmer to determine that all orderings which can occur will result in correct operation.
When multiple threads access the same memory location, and at least one thread is writing to that location, the resultant data will depend on the interleaved execution order of the threads.
This race condition leads to unpredictable results which may be masked by external effects such as the thread scheduling policy. Proper operation requires forcing partial execution order by using mutual exclusion locks or critical sections.
These devices ensure that a thread has exclusive access to a memory location to guarantee the proper ordering of reads and writes. Unfortunately, every lock must be explicitly specified by the programmer.
Sequential code is often filled with many order dependencies, which are implicitly satisfied by the serialized execution of the program. Miss any dependency in the parallel scenario and a latent bug has been created.
The improper use of locks can also cause program execution to deadlock. If two threads attempt to lock two memory locations, in opposite order, each thread will obtain one lock but stall forever waiting for the opposite one. A chain of data-dependent locking relationships can make this difficult to catch.
To be more conservative, a programmer can restrict the order of execution by the liberal use of locks. Besides increasing the chance of deadlock, heavy-handed locking raises the amount of serialization which, when considering the locking overhead, quickly reduces the performance of the parallel program to less than the original sequential version.
From a testing perspective, the quality of tests for sequential code can be measured by path coverage. There are a significant number of possible paths, and stimulating paths deep in the code can be difficult.
The number of composite paths for parallel threads increases exponentially, and explicitly controlling those paths may be impossible. Experience from the world of hardware verification suggests that concurrent execution requires extensive time and resources dedicated to testing.
Nevertheless, shared memory systems are being successfully deployed.


Loading comments... Write a comment