At the Supercomputing Conference in Denver, Colorado, general high end computer designers as well as developers interested in high end digital signal processing embedded design got their first look at the OpenMP 4.0 Standard API just released by the OpenMP Consortium.
In addition to several major enhancements, said Bronis R. de Supinski, Chair of the OpenMP Language Committee, this release provides a new mechanism to describe regions of code where data and/or computation should be moved to another computing device.
“The OpenMP 4.0 API is a major advance that adds two new forms of parallelism in the form of device constructs and SIMD constructs,“ he said. “It also includes several significant extensions for the loop-based and task-based forms of parallelism already supported in the OpenMP 3.1 API.”
With this release, the OpenMP API, the de-facto standard for parallel programming on shared memory systems, continues to extend its reach beyond pure HPC to include DSPs, real time systems, and accelerators. The OpenMP API aims to provide high-level parallel language support for a wide range of applications, from automotive and aeronautics to biotech, automation, robotics and financial analysis.
Of particular interest to developers using a variety of standard DSP engines from the likes of TI and Analog Devices, as well as custom designs for use on FPGA architectures such Altera and Xilinx, the new standard has support for accelerators.
To allow embedded developers using high end DSP functions, he said the acceleration support allows vendors to support a wide variety of compute devices, he said. It does this by providing mechanisms to describe regions of code where data and/or computation should be moved to another computing device.
Several prototypes for the accelerator proposal have already been implemented, said de Supinski.
“The accelerator model now available in OpenMP 4.0 is an important milestone for TI customers,” said Ramesh Kumar, DSP general manager, Texas Instruments. “With the release of TI’s 66AK2H multicore DSP + ARM SOC, customers can benefit from ARM and DSP in one chip. The model provides a seamless way to accelerate customer’s systems and achieve best in class power and performance.”
“The latest OpenMP 4.0 release will provide our HPC users with a single ’language‘ for offloading computational work to Xeon Phi coprocessors, NVIDIA GPUs, and ARM processors”, says Kent Milfeld, Manager, HPC Performance & Architecture Group of the Texas Advanced Computing Center. “Extending the base of OpenMP will encourage more researchers to take advantage of attached devices, and to develop applications that support multiple architectures.”
The OpenMP 4.0 API provides several extensions to its task-based parallelism support. He said tasks can be grouped to support deep task synchronization and task groups can be aborted to reflect completion of cooperative tasking activities such as search. Task-to-task synchronization is now supported through the specification of task dependency.
The new OpenMP 4.0 includes SIMD constructs to vectorize both serial as well as parallelized loops. “With the advent of SIMD units in all major processor chips,” said de Supinski,”portable support for accessing them is essential. To that end, the OpenMP 4.0 API provides mechanisms to describe when multiple iterations of the loop can be executed concurrently using SIMD instructions and to describe how to create versions of functions that can be invoked across SIMD lanes.