The steady desire for new applications, augmented reality, and higher display resolutions drives the development of embedded platforms and the need for faster, more powerful processors at the same time. As a consequence, today’s mobile platforms found in smartphones and tablets host multi-core Central Processing Units (CPUs) and even programmable embedded Graphics Processing Units (GPUs) to deliver the demanded performance.
This raises the question how to harness the processing power of these parallel and heterogeneous platforms. As a remedy, Google proposed two new parallel programming concepts for Android that allow to target CPUs as well as GPUs: Renderscript and Filterscript. These programming models were designed for the predominant application domain of image processing with portability in mind.
While applications in Android use Java as programming language, Renderscript and Filterscript are based on C99. Hence, Java programmers have to write low-level C code in order to benefit from the high performance that is provided by these new programming models.
As a remedy, this paper proposes code generators that allow to automatically generate Renderscript and Filterscript code. The generated target code is derived from a Domain-Specific Language (DSL) for image processing algorithms. The proposed code generators for Renderscript and Filterscript are based on the existing compiler infrastructure, which provides back ends for CUDA and OpenCL on discrete, standalone GPU accelerators. The focus of this work is on the code generation that allows to target the different components in today’s heterogeneous embedded platforms:
We present the first code generator for Renderscript and Filterscript on Android platforms starting from an abstract high-level representation. The generated implementations are even faster compared to the target-specific implementations in the Open Source Computer Vision (OpenCV) framework. At the same time, the algorithm description is compact and requires only a fraction compared to available highly optimized implementations.
We generate target code for embedded Heterogeneous System Architecture (HSA) platforms. With HSA, CPU and GPU share the same physical memory. This allows us to avoid extensive memory transfers and enables the employment of heterogeneous resources where the same data has to be accessed frequently from different compute resources.
To read this external content in full, download the complete paper from the author archives online at the University of Erlangen.