As battery-powered embedded devices move towards multicore processors, multicore energy efficiency is be- coming critically important. To this end, ARM recently announced its big.LITTLE architecture, featuring a mul- ticore chip with both high-performance processors and high-efficiency processors running a single instance of the Linux operating system.
Prior work has shown the benefits of running performance-critical code on the high-performance processors, while confining other pro- cessing to the high-efficiency processors.
This paper builds on this work by showing that for some important mobile workloads, pre-existing Linux-kernel tuning parameters originally designed for real time, high- performance computing, and SMP energy efficiency can further reduce the amount of non-performance-critical code running on the high-performance processors, resulting in energy efficiency gains in excess of 10%.
These devices run workloads with very low av- erage CPU utilization, but often experience short bursts of high-utilization work required for good user experience.
One approach for these workloads is asymmetric multiprocessing such as the big.LITTLE architecture is an asymmetric multiprocessing architecture developed by ARM
Here, the high-performance processors are Cortex- A15s and the high-efficiency processors are the Cortex- A7s. The Cortex-A15s run about twice as fast as the Cortex-A7s, which in turn run instructions about three times as efficiently as do the Cortex-A15s. As noted earlier, this configuration suggests running non- performance-critical code on the Cortex-A7s.
Unfortunately, performance-critical code might in- voke non-performance-critical deferred cleanup opera- tions. One example of such an operation is deferred work from read-copy update (RCU), a synchronization mechanism that is often used as a replacement for reader- writer locking for read-mostly data structures in the Linux kernel.
We have shown that enforced idle is effective at reducing energy consumption, but that it might also reduce perfor- mance by increasing RCU’s grace-period interval. We have also shown that RCU processing offload is also effective at reducing energy consumption, but with no perceptible performance reduction.
Preliminary results show that combining enforced idle and RCU processing offload do not result in further improvements in energy efficiency. Both enforced idle and RCU processing offload are gen- erally useful improvements that also happen to benefit energy efficiency on asymmetric multiprocessors.
We hypothesize that energy efficiency would be fur- ther improved if other deferred-work mechanisms (including timers, work queues, inter-processor interrupts, and softirq handlers) were to offload their work from the high-performance Cortex-A15 processors onto high- efficiency Cortex-A7 processors. We further hypothesize that the same is true of equivalent mechanisms in user- level applications and utilities.
Regardless of whether or not these hypotheses hold true, we have demonstrated that significant energy-efficiency improvements are possible for asymmetric computer systems given modest modifications to the software running on them. Such improvements will become increasingly important as battery-powered devices move to multicore configurations.
To read this external content in full, download the complete paper from the author archives at RDROP.com.