Common multicore programming problems: Part 3 - nonblocking algorithms, ping-ponging and memory reclamation
By Shameem Akhter and Jason Roberts, Intel Corp.
One way to solve the problems introduced by locks is to not use locks.
Algorithms designed to do this are called non-blocking.
The defining characteristic of a non-blocking algorithm is that
stopping a thread does not prevent the rest of the system from making
progress. There are different non-blocking guarantees:
1) Obstruction
freedom. A thread makes progress as long as there is no
contention, but live lock is possible. Exponential backoff can be used
to work around live lock.
2) Lock freedom.
The system as a whole makes progress.
3) Wait freedom.
Every thread makes progress, even when faced with contention. Very few
non-blocking algorithms achieve this.
Non-blocking algorithms are immune from lock contention, priority
inversion, and convoying. Non-blocking algorithms have a lot of
advantages, but with these come a new set of problems that need to be
understood.
Non-blocking algorithms are based on atomic operations. A few
non-blocking algorithms are simple. Most are complex, because the
algorithms must handle all possible interleaving of instruction streams
from contending processors.
A trivial non-blocking algorithm is counting via an interlocked
increment instead of a lock. The interlocked instruction avoids lock
overhead and pathologies.
However, simply using atomic operations is not enough to avoid race
conditions, because as discussed before, composing thread-safe
operations does not necessarily yield a thread-safe procedure. As an
example, the C code in Figure 7.10
below shows the wrong way and right way to decrement and test a
reference count p >ref_count.
In the wrong code, if the count was originally 2, two threads
executing the wrong code might both decrement the count, and then both
see it as zero at the same time. The correct code performs the
decrement and test as a single atomic operation.
 |
| Figure
7.10. Atomic Operations and Race Conditions |
Most non-blocking algorithms involve a loop that attempts to perform
an action using one or more compare-and-swap (CAS) operations, and
retries when one of the CAS operations fails. A simple and useful
example is implementing a thread-safe fetch-and-op.
A fetch-and-op reads a value from a location, computes a new value
from it, and stores the new value. Figure
7.11 below illustrates both a locked version and a non-blocking
version that operate on a location x.
The non-blocking version reads location x into a local temporary
x_old, and computes a new value x_new = op(x_old). The routine
InterlockedCompareExchange stores the new value, unless x is now
different than x_old. If the store fails, the code starts over until it
succeeds.
 |
| Figure
7.11. Comparison of Locked and Lockless Code for Fetch-and-op |
Fetch-and-op is useful as long as the order in which various threads
perform op does not matter. For example, op might be "multiply by 2."
The location x must have a type for which a compare-and-exchange
instruction is available.
ABA Problem
In Figure 7.11 above, there is
a time interval between when a thread executes "x_old = x" and when the thread
executes InterlockedCompareExchange.
During this interval, other processors might perform other fetch-and-op
operations.
For example, suppose the initial value read is A. An intervening
sequence of fetch-and-op operations by other processors might change x
to B and then back to A.
When the original thread executes InterlockedCompareExchange,
it will be as if the other processor's actions never happened. As long
as the order in which op is executed does not matter, there is no
problem. The net result is the same as if the fetch-and-op operations
were reordered such that the intervening sequence happens before the
first read.
 |
| Figure
7.12. Lockless Implementation of a Linked Stack that May Suffer from
ABA Problem |
But sometimes fetch-and-op has uses where changing x from A to B to
A does make a difference. The problem is indeed known as the ABA
problem. Consider the lockless implementation of a stack shown in Figure 7.12 above. It is written in
the fetch-and-op style, and thus has the advantage of not requiring any
locks.
But the "op" is no longer a pure function, because it deals with
another shared memory location: the field "next." Figure 7.13 below shows a sequence
where the function BrokenLockLessPop corrupts the linked stack.
When thread 1 starts out, it sees B as next on stack. But
intervening pushes and pops make C next on stack. But Thread 1's final
InterlockedCompareExchange does not catch this switch because it only
examines Top.
 |
| Figure
7.13. Sequence Illustrates ABA Problem for Code in Figure 7.12 |
The solution to the ABA problem is to never reuse A. In a
garbage-collected environment such as Java or .NET, this is simply a
matter of not recycling nodes. That is, once a node has been popped,
never push it again. Instead allocate a fresh node. The garbage
collector will do the hard work of checking that the memory for node A
is not recycled until all extant references to it are gone.
In languages with garbage collection, the problem is harder. An old
technique dating back to the IBM 370 changes ABA to ABA'. In other
words, make A slightly different each time.
This is typically done by appending a serial number to the pointer.
A special instruction that can do a double-wide compare-exchange is
required. On IA-32, the instruction is
cmpxchg8b, which does a compare-exchange on eight bytes.
On processors with Intel EM64T, it is cmpxchg16b. On Itanium processors,
there is cmp8xchg16, which is not quite the same, because it compares
only the first eight bytes, but exchanges all 16. However, as long as
the serial number is the first eight bytes, it works for turning ABA
into ABA'.
Another solution is to build a miniature garbage collector that
handles pointers involved in compare-exchange operations. These
pointers are called hazard pointers, because they present a hazard to
lockless algorithms.
Maged Michael's paper on hazard pointers (Michael
2004) explains how to implement hazard pointers. Hazard pointers
are a nontrivial exercise and make assumptions about the environment,
so tread with caution.