Common multicore programming problems: Part 1 -Threads, data races, deadlocks and live locks
Parallel programming has been around for decades, though before the advent of multi-core processors, it was an esoteric discipline. Numerous programmers have tripped over the common stumbling blocks by now. By recognizing these problems you can avoid stumbling.Furthermore, it is important to understand the common problems before designing a parallel program, because many of the problems arise from the overall decomposition of the program, and cannot be easily patched later. This series of articles looks at some of these common problems, their symptoms, and ways to circumvent them.
Too Many Threads
It may seem that if a little threading is good, then a lot must be
better. In fact, having too many threads can seriously degrade program
perform-ance. The impact comes in two ways. First, partitioning a fixed
amount of work among too many threads gives each thread too little
work, so that the overhead of starting and terminating threads swamps
the useful work. Second, having too many concurrent software threads
incurs overhead from having to share fixed hardware resources.
When there are more software threads than hardware threads, the operating system typically resorts to round robin scheduling. The scheduler gives each software thread a short turn, called a time slice, to run on one of the hardware threads.
When a software thread's time slice runs out, the scheduler preemptively suspends the thread in order to run another software thread on the same hardware thread. The software thread freezes in time until it gets another time slice.
Time slicing ensures that all software threads make some progress. Otherwise, some software threads might hog all the hardware threads and starve other software threads. However, this equitable distribution of hardware threads incurs overhead. When there are too many software threads, the overhead can severely degrade performance. There are several kinds of overhead, and it helps to know the culprits so you can spot them when they appear.
Thread register state. The most obvious overhead is the process of saving and restoring a thread's register state. Suspending a software thread requires saving the register values of the hardware thread, so the values can be restored later, when the software thread resumes on its next time slice. Typically, thread schedulers allocate big enough time slices so that the save/restore overheads for registers are insignificant, so this obvious overhead is in fact not much of a concern.
A more subtle overhead of time slicing is saving and restoring a thread's cache state. Modern processors rely heavily on cache memory, which can be about 10 to 100 times faster than main memory. Accesses that hit in cache are not only much faster; they also consume no bandwidth of the memory bus. Caches are fast, but finite.
When the cache is full, a processor must evict data from the cache
to make room for new data. Typically, the choice for eviction is the
least recently used data, which more often than not is data from an
earlier time slice. Thus threads tend to evict each other's data. The
net effect is that too many threads hurt performance by fighting each
other for cache.


Loading comments... Write a comment