Over the last two years mobile processors have not only increased in clock speed, but also made the step to multicore. With the introduction of dual core processors for mobile devices in 2010 and quad core processors in 2011, the available raw compute power increased significantly.
When hardware shifts the horizon of compute power, there is always software that takes advantage of it. Today, for example, high end mobile devices offer a gaming experience near or better than console quality.
With the increase of processing power came also an increase in complexity of the mobile hardware, such as the aforementioned multicore processors, but also dynamic frequency and voltage scaling, power gating and more. Whereas some complexity is transparent to software, others require software to be rewritten in order to take full advantage of it, which in turn adds complexity to the software.
This is especially true for multicore processors, that can only unleash their processing power if applications are written to execute in multiple threads. While the default programming languages on the major mobile platforms (Android: Java, iOS: Objective-C, Windows Phone: C#) offer multi threading by default, it is still up to the developer to use it. Moreover, it is likely that if a certain app really needs performance it will be using either a lower level language – such as C – or a specialized high performance language such as Google’s RenderScript.
We use an imaging algorithm and compare a reference implementation of this algorithm based on OpenCV with a multi threaded RenderScript implementation and an implementation based on computation offloading with Remote CUDA.
Experiments show that on a modern Tegra 3 quad core device a multi threaded implementation can achieve a 2.2 speed up factor at the same energy cost, whereas computation offloading does neither lead to speed ups nor energy savings.
To read this external content in full, download the paper from the author archives at VU University.