CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Common multicore programming problems: Part 3 - nonblocking algorithms, ping-ponging and memory reclamation



Embedded.com
One way to solve the problems introduced by locks is to not use locks. Algorithms designed to do this are called non-blocking.

The defining characteristic of a non-blocking algorithm is that stopping a thread does not prevent the rest of the system from making progress. There are different non-blocking guarantees:

1) Obstruction freedom. A thread makes progress as long as there is no contention, but live lock is possible. Exponential backoff can be used to work around live lock.

2) Lock freedom. The system as a whole makes progress.

3) Wait freedom. Every thread makes progress, even when faced with contention. Very few non-blocking algorithms achieve this.

Non-blocking algorithms are immune from lock contention, priority inversion, and convoying. Non-blocking algorithms have a lot of advantages, but with these come a new set of problems that need to be understood.

Non-blocking algorithms are based on atomic operations. A few non-blocking algorithms are simple. Most are complex, because the algorithms must handle all possible interleaving of instruction streams from contending processors.

A trivial non-blocking algorithm is counting via an interlocked increment instead of a lock. The interlocked instruction avoids lock overhead and pathologies.

However, simply using atomic operations is not enough to avoid race conditions, because as discussed before, composing thread-safe operations does not necessarily yield a thread-safe procedure. As an example, the C code in Figure 7.10 below shows the wrong way and right way to decrement and test a reference count p >ref_count.

In the wrong code, if the count was originally 2, two threads executing the wrong code might both decrement the count, and then both see it as zero at the same time. The correct code performs the decrement and test as a single atomic operation.

Figure 7.10. Atomic Operations and Race Conditions

Most non-blocking algorithms involve a loop that attempts to perform an action using one or more compare-and-swap (CAS) operations, and retries when one of the CAS operations fails. A simple and useful example is implementing a thread-safe fetch-and-op.

A fetch-and-op reads a value from a location, computes a new value from it, and stores the new value. Figure 7.11 below illustrates both a locked version and a non-blocking version that operate on a location x.

The non-blocking version reads location x into a local temporary x_old, and computes a new value x_new = op(x_old). The routine InterlockedCompareExchange stores the new value, unless x is now different than x_old. If the store fails, the code starts over until it succeeds.

Figure 7.11. Comparison of Locked and Lockless Code for Fetch-and-op

Fetch-and-op is useful as long as the order in which various threads perform op does not matter. For example, op might be "multiply by 2." The location x must have a type for which a compare-and-exchange instruction is available.

ABA Problem
In Figure 7.11 above, there is a time interval between when a thread executes "x_old = x" and when the thread executes InterlockedCompareExchange. During this interval, other processors might perform other fetch-and-op operations.

For example, suppose the initial value read is A. An intervening sequence of fetch-and-op operations by other processors might change x to B and then back to A.

When the original thread executes InterlockedCompareExchange, it will be as if the other processor's actions never happened. As long as the order in which op is executed does not matter, there is no problem. The net result is the same as if the fetch-and-op operations were reordered such that the intervening sequence happens before the first read.

Figure 7.12. Lockless Implementation of a Linked Stack that May Suffer from ABA Problem

But sometimes fetch-and-op has uses where changing x from A to B to A does make a difference. The problem is indeed known as the ABA problem. Consider the lockless implementation of a stack shown in Figure 7.12 above. It is written in the fetch-and-op style, and thus has the advantage of not requiring any locks.

But the "op" is no longer a pure function, because it deals with another shared memory location: the field "next." Figure 7.13 below shows a sequence where the function BrokenLockLessPop corrupts the linked stack.

When thread 1 starts out, it sees B as next on stack. But intervening pushes and pops make C next on stack. But Thread 1's final InterlockedCompareExchange does not catch this switch because it only examines Top.

Figure 7.13. Sequence Illustrates ABA Problem for Code in Figure 7.12

The solution to the ABA problem is to never reuse A. In a garbage-collected environment such as Java or .NET, this is simply a matter of not recycling nodes. That is, once a node has been popped, never push it again. Instead allocate a fresh node. The garbage collector will do the hard work of checking that the memory for node A is not recycled until all extant references to it are gone.

In languages with garbage collection, the problem is harder. An old technique dating back to the IBM 370 changes ABA to ABA'. In other words, make A slightly different each time.

This is typically done by appending a serial number to the pointer. A special instruction that can do a double-wide compare-exchange is required. On IA-32, the instruction is cmpxchg8b, which does a compare-exchange on eight bytes.

On processors with Intel EM64T, it is cmpxchg16b. On Itanium processors, there is cmp8xchg16, which is not quite the same, because it compares only the first eight bytes, but exchanges all 16. However, as long as the serial number is the first eight bytes, it works for turning ABA into ABA'.

Another solution is to build a miniature garbage collector that handles pointers involved in compare-exchange operations. These pointers are called hazard pointers, because they present a hazard to lockless algorithms.

Maged Michael's paper on hazard pointers (Michael 2004) explains how to implement hazard pointers. Hazard pointers are a nontrivial exercise and make assumptions about the environment, so tread with caution.

1 | 2

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :