Power efficiency is becoming an ever more important metric for both high performance and high throughput computing. Over the course of next decade it is expected that flops/watt will be a major driver for the evolution of computer architecture.
Servers with large numbers of ARM processors, already ubiquitous in mobile computing, are a promising alternative to traditional x86-64 computing. We present the results of our initial investigations into the use of ARM processors for scientific computing applications.
In particular we report the results from our work with a current generation ARMv7 development board to explore ARM- specific issues regarding the software development environment, operating system, performance benchmarks and issues for porting High Energy Physics software.
The computing requirements for high energy physics (HEP) projects like the Large Hadron Collider (LHC) at the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland are larger than can be met with resources deployed in a single computing center. This has led to the construction of a global distributed computing system known as the Worldwide LHC Computing Grid (WLCG), which brings together resources from nearly 160 computer centers in 35 countries.
Computing at this scale has been used, for example, by the CMS and ATLAS experiments for the discovery of the Higgs boson. To achieve this and other results the CMS experiment, for example, typically used during 2012 a processing capacity between 80,000 and 100,000 x86-64 cores from the WLCG.
The construction of the WLCG was greatly facilitated by the convergence around the year2000 on commodity x86 hardware and the standardized use of Linux as the operating system for scientific computing clusters. Even if multiple generations of x86 hardware (and hardware from both Intel and AMD) are provided in the various computer centers, this was a far simpler situation than the typical mix of proprietary UNIX operating systems and processors.
A strong contender for this evolving low power (high performance/watt) server market is the ARM processor due its nearly complete dominance in the low power mobile market for smartphones and tablets, which has also seen dramatic growth since around 2005. The size of the mobile market, and its traditional focus on low power, has led to interest in using these processors also in a server environment. As such ARM-based server products are starting to appear.
As the ARM processors are general purpose and run Linux, only a standard port of the CMS software is required, similar to what was done, for example, to port the CMS software from 32bit (ia32) to 64bit (x86-64). Such a port is reasonably straightforward relative to the changes required to use other high performance per watt solutions (e.g. GPGPU’s, which require actualsoftware rewrites), thus the effort required for these initial investigations was also relativelymodest.
We used a low-cost development board, the ODROID- U2 . The processor on the board is an Exynos 4412 Prime, a System-on-Chip (SoC) produced by Samsung for use in mobile devices. It is a quad-core Cortex A9 ARMv7 processor operating at 1.7GHz with 2GB of LP-DDR2 memory. The processor also contains an ARM Mali-400 quad-core GPU accelerator, although that was not used for the work described in this paper. The board has eMMC and microSD slots, two USB 2.0 ports and 10/100Mbps Ethernet with an RJ-45 port. Power is provided a 5V DC power adaptor.
For the Linux operating system on the board we used Fedora 18 ARM Remix with kernel version 3.0.75 (provided by Hardkernel, the vendor for the ODROID-U2 board) due to its similarities to Scientific Linux CERN (SLC). It is fully hard float capable and uses the floating point unit on the SoC. The kernel was reconfigured to enable swap devices/files, which is required for CMSSW compilation. A 4GB swap file was used in our build environment.
To read more of this external content in PDF form, download the complete paper from the online open archives at IOP Science.