Threading and Parallel Programming Constructs used in multicore systems development: Part 1The theory and practice of the principal parallel programming constructs that focus on threading is important in multicore software development and begins with the fundamental concepts of synchronization, critical section, and deadlock.
In this series of three articles we will deal with implementation details and related issues. In this first part we will intoduce synchronization and critical sections. In Part 2, we will deal with synchronization primitives including semaphores, locks and condition variables. In Part3 we discuss messages, flow control-based concepts such as fence and barrier and implementation-dependent threading features.
Synchronization is an enforcing mechanism used to impose constraints on the order of execution of threads. The synchronization controls the relative order of thread execution and resolves any conflict among threads that might produce unwanted behavior. In simple terms, synchronization is used to coordinate thread execution and manage shared data.
In an environment where messages are used for communicating between a sender and a receiver, synchronization is implicit, as a message must be sent before the message can be received.
On the other hand, for a shared-memory based environment, threads have no implicit interdependency unless some constraints are imposed. Two types of synchronization operations are widely used: mutual exclusion and condition synchronization.
In the case of mutual exclusion,
one thread blocks a critical
section - a section of code that contains shared data - and one or more
threads wait to get their turn to enter into the section.
This helps when two or more threads share the same memory space and run simultaneously. The mutual exclusion is controlled by a scheduler and depends on the granularity of the scheduler.
Condition synchronization, on the other hand, blocks a thread until the system state specifies some specific conditions. The condition synchronization allows a thread to wait until a specific condition is reached. Figure 4.1 below shows the generic representation of synchronization.
|Figure 4.1. Generic Representation of Synchronization Block inside Source Code|
While a number of techniques are available for synchronization, only a few methods are used by developers on a regular basis. The techniques used are also to some extent determined by the programming environment.
The scope of synchronization is broad. Proper synchronization orders the updates to data and provides an expected outcome. As shown in Figure 4.2 below, shared data d can get access by threads Ti and Tj at time ti, tj, tk, tl, where
and a proper synchronization maintains the order to update d at these instances and considers the state of d as a synchronization function of time.
This synchronization function, s, represents the behavior of a synchronized construct with respect to the execution time of a thread.
|Figure 4.2. Shared Data Synchronization, Where Data d Is Protected by a Synchronization Operation|
Figure 4.3 below represents how synchronization operations are performed in an actual multi-threaded implementation in a generic form, and demonstrates the flow of threads.
When m >=1, the creation timing for initial threads T1Tm might not be the same. After block Bi as well as Bj, the number of threads could be different, which means m is not necessarily equal to n and n is not necessarily equal to p. For all operational environments, the values of m, n, and p are at least 1.
|Figure 4.3. Operational Flow of Threads for an Application|
A section of a code block called a critical section is where shared dependency variables reside and those shared variables have dependency among multiple threads. Different synchronization primitives are used to keep critical sections safe.
With the use of proper synchronization techniques, only one thread is allowed access to a critical section at any one instance. The major challenge of threaded programming is to implement critical sections in such a way that multiple threads perform mutually exclusive operations for critical sections and do not use critical sections simultaneously.
Critical sections can also be referred to as synchronization blocks. Depending upon the way critical sections are being used, the size of a critical section is important. Minimize the size of critical sections when practical.
Larger critical sections-based code blocks should split into multiple code blocks. This is especially important in code that is likely to experience significant thread contention. Each critical section has an entry and an exit point. A critical section can be represented as shown in Figure 4.4 below.
|Figure 4.4. Implement Critical Section in Source Code|
Deadlock occurs whenever a thread is blocked waiting on a resource of another thread that will never become available. According to the circumstances, different deadlocks can occur: self-deadlock, recursive deadlock, and lock-ordering deadlock. In most instances, deadlock means lock-ordering deadlock.
The self-deadlock is the instance or condition when a thread, Ti, wants to acquire a lock that is already owned by thread Ti.
In Figure 4.5 (a) below, at time ta thread Ti owns lock li, where li is going to get released at tc. However, there is a call at tb from Ti, which requires li. The release time of li is td, where td can be either before or after tc. In this scenario, thread Ti is in self-deadlock condition at tb.
When the wakeup path of thread Ti, resides in another thread, Tj, that condition is referred to as recursive deadlock, as shown in Figure 4.5 (b) below. Figure 4.5 (c) below illustrates a lock-ordering thread, where thread Ti locks resource rj and waits for resource ri, which is being locked by thread Tj.
Also, thread Tj locks resource ri and waits for resource rj, which is being locked by thread Ti. Here, both threads Ti and Tj are in deadlock at td, and w is the wait-function for a lock.
|Figure 4.5. Deadlock Scenarios|
To represent the transition model of deadlock of an environment, consider representing atomic states by si and each thread of the system by Ti. Each thread can transition from one state to another by requesting a resource, acquiring a resource, or freeing the current resource.
So, the transition can be represented as shown in Figure 4.6 below, where,
|Figure 4.6. Deadlock Scenario in a State Transition for a Thread|
For any thread Ti, if the state transition of Ti becomes sd for all possible scenarios and remains blocked at sd, thread Ti would not have any way to transition from sd to any other state. That is why state sd is called the deadlock state for thread Ti.Avoiding deadlock is one of the challenges of multi-threaded programming. There must not be any possibility of deadlock in an application. A lock-holding prevention mechanism or the creation of lock hierarchy can remove a deadlock scenario. One recommendation is to use only the appropriate number of locks when implementing synchronization. .
Next in Part 2: Synchronization
This series of three articles was excerpted from Multi-Core Programming by Shameem Akhter and Jason Roberts, published by Intel Press. Copyright © 2006 Intel Corporation. All rights reserved. This book can be purchased on line.
Shameem Akhter is a platform architect at Intel Corp., focusing on single socket multicore architecture and performance analysis. Jason Roberts is a senior software engineer at Intel, working on a number of different multithreaded software products ranging from desktop, handheld and embedded DSP platforms.