In the next generation of supercomputers, namely exascale systems, a major concern of High Performance Processing community is related to the energy consumption. Exascale machines will have 100 more times processing power than the best current machines.
However, the energy required to maintain these systems correspond to power from a nuclear plant of medium size. Therefore, in the same way that is nec essary to increase the performance, it is also mandatory to reduce the energy consumption of these supercomputers
Currently, the High Performance Computing (HPC) systems are composed of Ge- neral Purpose Processors, e.g., Intel Xeon. These processors have great processing power when compared to low-power processors (e.g., Intel Atom), but with high Thermal Design Power (TDP). While the processors present in HPC systems have similar TDP of 130 Watt, low-power processors have TDP much lower than 130 Watt, where in some cases correspond to 2% of this value.
For example, the Intel Atom N2600 has maximum TDP of only 3.5 Watt and the ARM Cortex A9 has TDP of 2.5 Watt. Therefore, the use of low-power processors is an alternative to the HPC systems that may join the exascale era.
This study aims to compare the use of multi-level parallelism for low-power architectures: Intel Atom and ARM Cortex-A9. The comparison wa performed in terms of performance, and energy consumption. The Energy Delay Product (EDP) metric was used to study the trade-off between energy and performance). For this, the set of NAS Parallel Benchmarks (NPB) Multi-Zone Version was used.
The NAS Parallel Benchmarks is a set of programs design to help evaluate the per- formance of parallel supercomputers and parallelization tools, originally developed by NASA [NAS 2013]. The NPB programs are derived from computational fluid dynamics codes and are implemented with different Parallel Programming Interfaces, such as MPI.
In this work , we use the Multi-Zone version of NPB (NPB-MZ). The NPB-MZ version is designed to exploit multiple levels of parallelism in applications and to test the effectiveness of multi-level and hybrid parallelization paradigms and tools. In this version, are implemented three (LU, BT and SP) of the eight benchmarks available in the single-zone version of NPB. All benchmarks are implemented in Fortran language and parallelized with Open Multi-Processing (OpenMP) and Message-Passing Interface (MPI).
The results obtained shown that Atom got the best results. Although in some cases, executions in ARM possessed lower power consumption, this factor is not the same when it relates to performance.
For all test cases, the Atom-based cluster proved to be the best option for use of multi-level parallelism at low power processors.
As part of our future work, we plan to repeat executions in general purpose archi- tectures, e.g., Intel Xeon, trying to find a ratio between the energy efficiency of low-power systems and general purpose.
To read this external content in full, download the complete paper from the authors open online archives at the Federal University of Rio Grande do Sul.