By Shameem Akhter and Jason Roberts, Intel Corp.
In addition to the pragmas discussed earlier in this series that make
parallel programming a bit easier, OpenMP provides a set of functions
calls and environment variables. So far, only the pragmas have been
described. The pragmas are the key to OpenMP because they provide the
highest degree of simplicity and portability, and the pragmas can be
easily switched off to generate a non-threaded version of the code.
In contrast, the OpenMP function calls require you to add the
conditional compilation in your programs as shown below, in case you
want to generate a serial version.
#include
# <omp.h>
ifdef _OPENMP
omp_set_num_threads(4);
#endif
When in doubt, always try to use the pragmas and keep the function
calls for the times when they are absolutely necessary. To use the
function calls, include the header file. The compiler
automatically links to the correct libraries.
The four most heavily used OpenMP library functions are shown in
Table 6.5 below. They retrieve
the total number of threads, set the number of
threads, return the current thread number, and return the number of
available cores, logical processors or physical processors,
respectively. To view the complete list of OpenMP library functions,
please see the OpenMP Specification Version 2.5, which is available
from OpenMP web site at www.openmp.org.
 |
| Table
6.5 The Most Heavily Used OpenMP Library Functions |
Figure 6.2 below uses these
functions to perform data processing for each element in array x. This
example illustrates a few important concepts when using the function
calls instead of pragmas. First, your code must be rewritten, and with
any rewrite comes extra documentation, debugging, testing, and
maintenance work. Second, it becomes difficult or impossible to compile
without OpenMP support.
Finally, because thread values have been hard coded, you lose the
ability to have loop-scheduling adjusted for you, and this threaded
code is not scalable beyond four cores or processors, even if you have
more than four cores or processors in the system.
 |
| Figure
6.2 Loop that Uses OpenMP Functions and Illustrates the Drawbacks |
OpenMP Environment Variables
The OpenMP specification defines a few environment variables.
Occasionally the two shown in Table 6.6 may be useful during
development.
Additional compiler-specific environment variables are usually
available. Be sure to review your compiler's documentation to become
familiar with additional variables.
 |
| Table
6.6 Most Commonly Used Environment Variables for OpenMP |
Compilation
Using the OpenMP pragmas requires an OpenMP- compatible compiler and
thread-safe runtime libraries. The Intel C++ Compiler version 7.0 or
later and the Intel Fortran compiler both support OpenMP on Linux and
Windows. This discussion of compilation and debugging will focus on
these compilers.
Several other choices are available as well, for instance, Microsoft
supports OpenMP in Visual C++ 2005 for Windows and the Xbox 360
platform, and has also made OpenMP work with managed C++ code. In
addition, OpenMP compilers for C/C++ and Fortran on Linux and Windows
are available from the Portland Group.
The /Qopenmp command-line option given to the Intel C++ Compiler
instructs it to pay attention to the OpenMP pragmas and to create
multithreaded code. If you omit this switch from the command line, the
compiler will ignore the OpenMP pragmas.
This action provides a very simple way to generate a single-threaded
version without changing any source code. Table 6.7 below provides a summary
of invocation options for using OpenMP. The thread-safe runtime
libraries are selected and linked automatically when the OpenMP related
compilation switch is used.
 |
| Table
6.6 Most Commonly Used Environment Variables for OpenMP |
The Intel compilers support the OpenMP Specification Version 2.5
except the workshare construct. Be sure to browse the release notes and
compatibility information supplied with the compiler for the latest
information. The complete OpenMP specification is available from the
OpenMP Web site.
Debugging
Debugging multithreaded applications has always been a challenge due to
the nondeterministic execution of multiple instruction streams caused
by runtime thread-scheduling and context switching.
Also, debuggers may change the runtime performance and thread
scheduling behaviors, which can mask race conditions and other forms of
thread interaction. Even print statements can mask issues because they
use synchronization and operating system functions to guarantee
thread-safety.
Debugging an OpenMP program adds some difficulty, as OpenMP
compilers must communicate all the necessary information of private
variables, shared variables, threadprivate variables, and all kinds of
constructs to debuggers after threaded code generation; additional code
that is impossible to examine and step through without a specialized
OpenMP-aware debugger.
Therefore, the key is narrowing down the problem to a small code
section that causes the same problem. It would be even better if you
could come up with a very small test case that can reproduce the
problem. The following list provides guidelines for debugging OpenMP
programs:
1. Use the binary search
method to identify the parallel construct causing the failure by
enabling and disabling the OpenMP pragmas in the program.
2. Compile the routine
causing problem with no /Qopenmp switch and with /Qopenmp_stubs switch;
then you can check if the code fails with a serial run, if so, it is a
serial code debugging. If not, go to
Step 3.
3. Compile the routine
causing problem with /Qopenmp switch and set the environment variable
OMP_NUM_THREADS=1; then you can check if the threaded code fails with a
serial run. If so, it is a single-thread code debugging of threaded
code. If not, go to Step 4.
4. Identify the failing
scenario at the lowest compiler optimization level by compiling it with
/Qopenmp and one of the switches such as /Od, /O1, /O2, /O3, and/or
/Qipo.
5. Examine the code section
causing the failure and look for problems such as violation of data
dependence after paralleliza-tion, race conditions, deadlock, missing
barriers, and uninitialized variables. If you can not spot any problem,
go to Step 6.
6. Compile the code using
/Qtcheck to perform the OpenMP code instrumentation and run the
instrumented code inside the Intel Thread Checker.
Problems are often due to race conditions. Most race conditions are
caused by shared variables that really should have been declared
private, reduction, or threadprivate.
Sometimes, race conditions are also caused by missing necessary
synchronization such as critica and atomic protection of updating
shared variables. Start by looking at the variables inside the parallel
regions and make sure that the variables are declared private when
necessary. Also, check functions called within parallel constructs.
By default, variables declared on the stack are private but the
C/C++ keyword static changes the variable to be placed on the global
heap and therefore the variables are shared for OpenMP loops.
The default(none) clause, shown in the following code sample, can be
used to help find those hard-to-spot variables. If you specify
default(none), then every variable must be declared with a data-sharing
attribute clause.
#pragma omp parallel for
default(none) private(x,y) shared(a,b)