Open-source software meets broad needs of robot-vision developers
Robot vision applications can bring a complex set of requirements, but open-source libraries are ready to provide solutions for nearly every need. Developers can find open-source packages ranging from basic image processing and object recognition to motion planning and collision avoidance and more than can possibly be mentioned much less given their full due in a brief article. Nevertheless, here are some key open-source image-processing packages that can help developers implement sophisticated robot systems. (Note: this report focuses on libraries for more fundamental image-based algorithms and specifically excludes open-source software for AI-based robot vision.)
No article on robot vision software can fail to highlight the Open Source Computer Vision Library (OpenCV) [source]. Among available open-source software packages, OpenCV is perhaps the most widely used and functionally rich. Implementing over 2,500 algorithms, the OpenCV distribution addresses image processing requirements in a series of modules, which includes the following among others:
core, which defines basic data structures and functions used by all other modules;
imgproc, which provides image processing functions including linear and non-linear image filtering, geometrical image transformations, color space conversion, histograms, and more;
video, which supports motion estimation, background subtraction, and object tracking algorithms;
calib3d, which provides basic geometry algorithms, camera calibration, object pose estimation, and more;
features2d, which provides feature detectors, descriptors, and descriptor matches;
objdetect, which provides detection of objects and instances of predefined classes;
Written in C++, OpenCV is available with interfaces for C++, Python, Java, and Matlab and supports Windows, Linux, Android and Mac OS. Along with its support for single instruction, multiple data (SIMD) instruction sets, OpenCV provides CUDA-based GPU acceleration for many functions through its gpu module and OpenCL acceleration through its ocl module. Recently released, OpenCV 4.0 brings a number of performance improvements and capabilities including implementation of the popular Kinect Fusion algorithm.
For its functionality, OpenCV can require a learning curve that exceeds the patience of developers looking to move quickly with robot vision. For these developers, Python-based SimpleCV [source] might be the answer. Built on OpenCV, SimpleCV provides the functionality required by advanced robot-vision developers but provides an accessible framework that helps less experienced developers explore basic machine vision functions with simple Python function calls. For example, developers can quickly implement commonly used functions such as image thresholding using a simple built-in method in the SimpleCV Image class (img.binarize() in listing below) and finally displaying the results shown in Figure 1.
from SimpleCV import Image, Color, Display # Make a function that does a half and half image. def halfsies(left,right): result = left # crop the right image to be just the right side. crop = right.crop(right.width/2.0,0,right.width/2.0,right.height) # now paste the crop on the left image. result = result.blit(crop,(left.width/2,0)) # return the results. return result # Load an image from imgur. img = Image('http://i.imgur.com/lfAeZ4n.png') # binarize the image using a threshold of 90 # and invert the results. output = img.binarize(90).invert() # create the side by side image. result = halfsies(img,output) # show the resulting image. result.show() # save the results to a file. result.save('juniperbinary.png')
Figure 1. Results of Python code listed above (Source: SimpleCV)
Along with their basic image processing functions, OpenCV and SimpleCV implement a number of high-level image processing algorithms that robotic systems need to work with objects or operate safely within their physical environment. One of the fundamental data structures used in many of these computations is the point cloud – a collection of multi-dimensional data points that represent an object (Figure 2). Acquired from cameras, the point cloud of an object is used for fundamental robotic operations such as object identification, alignment, and fitting. For working with point clouds, the Point Cloud Library (PCL) [source] implements algorithms for filtering, fitting, keypoint extraction, segmentation, and much more.
Figure 2. Point cloud data set for a basic torus. (Source: Wikimedia Commons/Kieff).