Using OpenMP for programming parallel threads in multicore applications: Part 4In addition to the pragmas discussed earlier in this series that make parallel programming a bit easier, OpenMP provides a set of functions calls and environment variables. So far, only the pragmas have been described. The pragmas are the key to OpenMP because they provide the highest degree of simplicity and portability, and the pragmas can be easily switched off to generate a non-threaded version of the code.
In contrast, the OpenMP function calls require you to add the conditional compilation in your programs as shown below, in case you want to generate a serial version.
When in doubt, always try to use the pragmas and keep the function
calls for the times when they are absolutely necessary. To use the
function calls, include the
The four most heavily used OpenMP library functions are shown in Table 6.5 below. They retrieve the total number of threads, set the number of threads, return the current thread number, and return the number of available cores, logical processors or physical processors, respectively. To view the complete list of OpenMP library functions, please see the OpenMP Specification Version 2.5, which is available from OpenMP web site at www.openmp.org.
|Table 6.5 The Most Heavily Used OpenMP Library Functions|
Figure 6.2 below uses these functions to perform data processing for each element in array x. This example illustrates a few important concepts when using the function calls instead of pragmas. First, your code must be rewritten, and with any rewrite comes extra documentation, debugging, testing, and maintenance work. Second, it becomes difficult or impossible to compile without OpenMP support.
Finally, because thread values have been hard coded, you lose the ability to have loop-scheduling adjusted for you, and this threaded code is not scalable beyond four cores or processors, even if you have more than four cores or processors in the system.
|Figure 6.2 Loop that Uses OpenMP Functions and Illustrates the Drawbacks|
OpenMP Environment Variables
The OpenMP specification defines a few environment variables. Occasionally the two shown in Table 6.6 may be useful during development.
Additional compiler-specific environment variables are usually available. Be sure to review your compiler's documentation to become familiar with additional variables.
|Table 6.6 Most Commonly Used Environment Variables for OpenMP|
Compilation Using the OpenMP pragmas requires an OpenMP- compatible compiler and thread-safe runtime libraries. The Intel C++ Compiler version 7.0 or later and the Intel Fortran compiler both support OpenMP on Linux and Windows. This discussion of compilation and debugging will focus on these compilers.
Several other choices are available as well, for instance, Microsoft supports OpenMP in Visual C++ 2005 for Windows and the Xbox 360 platform, and has also made OpenMP work with managed C++ code. In addition, OpenMP compilers for C/C++ and Fortran on Linux and Windows are available from the Portland Group.
The /Qopenmp command-line option given to the Intel C++ Compiler instructs it to pay attention to the OpenMP pragmas and to create multithreaded code. If you omit this switch from the command line, the compiler will ignore the OpenMP pragmas.
This action provides a very simple way to generate a single-threaded version without changing any source code. Table 6.7 below provides a summary of invocation options for using OpenMP. The thread-safe runtime libraries are selected and linked automatically when the OpenMP related compilation switch is used.
|Table 6.6 Most Commonly Used Environment Variables for OpenMP|
The Intel compilers support the OpenMP Specification Version 2.5 except the workshare construct. Be sure to browse the release notes and compatibility information supplied with the compiler for the latest information. The complete OpenMP specification is available from the OpenMP Web site.
Debugging multithreaded applications has always been a challenge due to the nondeterministic execution of multiple instruction streams caused by runtime thread-scheduling and context switching.
Also, debuggers may change the runtime performance and thread scheduling behaviors, which can mask race conditions and other forms of thread interaction. Even print statements can mask issues because they use synchronization and operating system functions to guarantee thread-safety.
Debugging an OpenMP program adds some difficulty, as OpenMP compilers must communicate all the necessary information of private variables, shared variables, threadprivate variables, and all kinds of constructs to debuggers after threaded code generation; additional code that is impossible to examine and step through without a specialized OpenMP-aware debugger.
Therefore, the key is narrowing down the problem to a small code section that causes the same problem. It would be even better if you could come up with a very small test case that can reproduce the problem. The following list provides guidelines for debugging OpenMP programs:
1. Use the binary search method to identify the parallel construct causing the failure by enabling and disabling the OpenMP pragmas in the program.
2. Compile the routine causing problem with no /Qopenmp switch and with /Qopenmp_stubs switch; then you can check if the code fails with a serial run, if so, it is a serial code debugging. If not, go to Step 3.
3. Compile the routine causing problem with /Qopenmp switch and set the environment variable OMP_NUM_THREADS=1; then you can check if the threaded code fails with a serial run. If so, it is a single-thread code debugging of threaded code. If not, go to Step 4.
4. Identify the failing scenario at the lowest compiler optimization level by compiling it with /Qopenmp and one of the switches such as /Od, /O1, /O2, /O3, and/or /Qipo.
5. Examine the code section causing the failure and look for problems such as violation of data dependence after paralleliza-tion, race conditions, deadlock, missing barriers, and uninitialized variables. If you can not spot any problem, go to Step 6.
6. Compile the code using /Qtcheck to perform the OpenMP code instrumentation and run the instrumented code inside the Intel Thread Checker.
Problems are often due to race conditions. Most race conditions are caused by shared variables that really should have been declared private, reduction, or threadprivate.
Sometimes, race conditions are also caused by missing necessary synchronization such as critica and atomic protection of updating shared variables. Start by looking at the variables inside the parallel regions and make sure that the variables are declared private when necessary. Also, check functions called within parallel constructs.
By default, variables declared on the stack are private but the
C/C++ keyword static changes the variable to be placed on the global
heap and therefore the variables are shared for OpenMP loops.
The default(none) clause, shown in the following code sample, can be used to help find those hard-to-spot variables. If you specify default(none), then every variable must be declared with a data-sharing attribute clause.
#pragma omp parallel for
default(none) private(x,y) shared(a,b)