Developing OpenCV computer vision apps for the Android platform

Eric Gregori, BDTI

January 31, 2013

Eric Gregori, BDTIJanuary 31, 2013

You now can hold in the palm of your hand computing power that required a desktop PC form factor just a decade ago. And with its contributions to the development of open-source OpenCV4Android, NVIDIA has brought the power of the OpenCV computer vision library to the smartphone and tablet.

The tools described in this article provide a unique implementation opportunity to do robust development on low-cost and full-featured hardware and software, for computer vision experimenters, academics, and professionals alike.

The present and future of embedded vision
Beginning in 2011, the number of smartphones shipped exceeded shipments of client PCs (netbooks, nettops, notebooks and desktops). [1] And nearly three quarters of all smartphones sold worldwide during the third quarter of 2012 were based on Google’s Android operating system. [2] Analysts forecast that more than 1 billion smartphones and tablets will be purchased in 2013. [3]

Statistics such as these suggest that smartphones and tablets, particularly those based on Android, are becoming the dominant computing platforms. This trend is occurring largely because the performance of these devices is approaching that of laptops and desktops. [4] Reflective of smartphone and tablet ascendance, Google announced that as of September 2012 more than 25 billion apps had been downloaded from Google's online application store, Google Play (Figure 1). [5]

Figure 1: Google's Play online application store has experienced dramatic usage growth

The term “embedded vision” refers to the use of computer vision technology in systems other than conventional computers. Stated another way, “embedded vision” refers to non-computer systems that extract meaning from visual inputs. Vision processing algorithms were originally only capable of being implemented on costly, bulky, and power-hungry high-end computers. As a result, computer vision has historically been primarily confined to a scant few application areas such as factory automation and military equipment.

However, mainstream systems such as smartphones and tablets now include SoCs with dual- and quad-core GHz+ processors, along with integrated GPUs containing abundant processing capabilities and dedicated image processing function blocks. Some SoCs even embed general-purpose DSP cores. The raw computing power in the modern-day mobile smartphone and tablet is now at the point where the implementation of embedded vision is not just possible but practical. [6] NVIDIA recognized the power of mobile embedded vision in 2010 and began contributing to the OpenCV computer vision library, to port code originally intended for desktop PCs to the Android operating system.

The OpenCV (Open Source Computer Vision) Library was created to provide a common resource for diverse computer vision applications and to accelerate the use of computer vision in everyday products. [7] OpenCV is distributed under the liberal BSD open-source license and is free for commercial or research use.

Top universities and Fortune 100 companies, among other sources, develop and maintain the 2,500 algorithms contained in the library. OpenCV is written in C and C++, but the Applications Interface also includes "wrappers" for Java, MATLAB and Python. Community-supported ports currently exist for Windows, Linux, Mac OS X, Android, and iOS platforms. [8]

The OpenCV library has been downloaded more than five million times and is popular with both academics and design engineers. Quoting from the Wiki for the Google Summer of Code 2010 OpenCV project, “OpenCV is used extensively by companies, research groups, and governmental bodies. Some of the companies that use OpenCV are Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, and Toyota. Many startups such as Applied Minds, VideoSurf, and Zeitera make extensive use of OpenCV. OpenCV's deployed uses span the range from stitching Streetview images together, detecting intrusions in surveillance video in Israel, monitoring mine equipment in China (more controversially, OpenCV is used in China's "Green Dam" internet filter), helping robots navigate and pick up objects at Willow Garage, detection of swimming pool drowning events in Europe, running interactive art in Spain and New York, checking runways for debris, inspecting labels on products in factories around the world on to rapid face detection in Japan.”

With the creation of the OpenCV Foundation in 2012, OpenCV has a new face and a new infrastructure [9, 10]. It now encompasses more than 40 different "builders", which test OpenCV in various configurations on different operating systems, both mobile and desktop (a “builder” is an environment used to build the library under a specific configuration. The goal is to verify that the library builds correctly under the specified configuration). A "binary compatibility builder" also exists, which evaluates binary compatibility of the current snapshot against the latest OpenCV stable release, along with a documentation builder that creates reference manuals and uploads them to the OpenCV website.

OpenCV4Android is the official name of the Android port of the OpenCV library. [11] OpenCV began supporting Android in a limited "alpha" fashion in early 2010 with OpenCV version 2.2. [12] NVIDIA subsequently joined the project and accelerated the development of OpenCV for Android by releasing OpenCV version 2.3.1 with beta Android support. [13] This initial beta version included an OpenCV Java API and native camera support. The first official non-beta release of OpenCV for Android was in April 2012, with OpenCV 2.4. At the time of this article's publication, OpenCV 2.4.3 has just been released, with even more Android support improvements and features, including a Tegra 3 SoC-accelerated version of OpenCV.

OpenCV4Android supports two languages for coding OpenCV applications to run on Android-based devices. [14] The easiest way to develop your code is to use the OpenCV Java API. OpenCV exposes most (but not all) of its functions to Java. This incomplete implementation can pose problems if you need an OpenCV function that has not yet received Java support. Before choosing to use Java for an OpenCV project, then, you should review the OpenCV Java API for functions your project may require.

OpenCV Java API
With the Java API, each ported OpenCV function is "wrapped" with a Java interface. The OpenCV function itself, on the other hand, is written in C++ and compiled. In other words, all of the actual computations are performed at a native level. However, the Java wrapper results in a performance penalty in the form of JNI (Java Native Interface) overhead, which occurs twice: once at the start of each call to the native OpenCV function and again after each OpenCV call (i.e. during the return). This performance penalty is incurred for every OpenCV function called; the more OpenCV functions called per frame, the bigger the cumulative performance penalty.

Although applications written using the OpenCV Java API run under the Android Dalvik virtual machine, for many applications the performance decrease is negligible. Figure 2 shows an OpenCV for Android application written using the Java API. This application calls three OpenCV functions per video frame. The ellipses highlight each cluster of two JNI penalties (entry and exit); this particular application will incur six total JNI call penalties per frame.

Figure 2: When using the Java API, each OpenCV function call incurs JNI overhead, potentially decreasing performance

A slightly more difficult but more performance optimized development method uses the Android NDK (Native Development Kit). In this approach, the OpenCV vision pipeline code is written entirely in C++, with direct calls to OpenCV. You simply encapsulate all of the OpenCV calls in a single C++ class, calling it once per frame. With this method, only two JNI call penalties are incurred per frame, so the per-frame JNI performance penalty is significantly reduced. Java is still used for non-vision portions of the application, including the GUI.

Using this method, you first develop and test the OpenCV implementation of your algorithm on a host platform. Once your code works the way you want it to, you simply copy the C++ implementation into an Android project and rebuild it using the Android tools. You can also easily port the C++ implementation to another platform such as iOS by rebuilding it with the correct tools in the correct environment.

Figure 3 shows the same OpenCV for Android application as in Figure 2, but this time written using the native C++ API. It calls three OpenCV functions per video frame. The ellipse highlights the resulting JNI penalties. The OpenCV portion of the application is written entirely in C++, thereby incurring no JNI penalties between OpenCV calls. Using the native API for this application reduces the per-video frame JNI penalties from six to two.

Figure 3: Using native C++ to write the OpenCV portion of your application reduces JNI calls, optimizing performance

Figure 4 shows the OpenCV4Android face detection demo running on a NVIDIA Tegra 3-based Nexus 7 tablet. The demo has two modes: Java mode uses the Java API, while native mode uses the C++ API. Note the frame rates in the two screen snapshots: in this case, the OpenCV face detection Java API is performing at the same frame rate as the C++ API.

Java mode

Native mode

Figure 4: A face detection algorithm implemented using OpenCV4Android delivers identical performance in Java mode (a) and native mode (b): 7.55 fps

How is this possible, considering the previously discussed performance discrepancies between Java and C++ API development? Unlike the previous application, this example implements only one function call (detectMultiScale()) per frame, for face detection purposes. Calling only a single OpenCV function, regardless of whether you're using the native C++ or Java API, incurs the same two JNI call penalties.

In this case, the slight difference in performance is most likely the result of the number of parameters that have to pass through the JNI interface. The native C++ face detector call only has two parameters; the remainder are passed during the initialization phase. The Java API detectMultiScale() call, on the other hand, passes seven parameters through the JNI interface.

< Previous
Page 1 of 3
Next >

Loading comments...

Most Commented