Open-source software meets broad needs of robot-vision developers


This article is part of an AspenCore Special Project on vision guided robots.

Robot vision applications can bring a complex set of requirements, but open-source libraries are ready to provide solutions for nearly every need. Developers can find open-source packages ranging from basic image processing and object recognition to motion planning and collision avoidance and more than can possibly be mentioned much less given their full due in a brief article. Nevertheless, here are some key open-source image-processing packages that can help developers implement sophisticated robot systems. (Note: this report focuses on libraries for more fundamental image-based algorithms and specifically excludes open-source software for AI-based robot vision.)

No article on robot vision software can fail to highlight the Open Source Computer Vision Library (OpenCV) [source]. Among available open-source software packages, OpenCV is perhaps the most widely used and functionally rich. Implementing over 2,500 algorithms, the OpenCV distribution addresses image processing requirements in a series of modules, which includes the following among others:

  • core, which defines basic data structures and functions used by all other modules;

  • imgproc, which provides image processing functions including linear and non-linear image filtering, geometrical image transformations, color space conversion, histograms, and more;

  • video, which supports motion estimation, background subtraction, and object tracking algorithms;

  • calib3d, which provides basic geometry algorithms, camera calibration, object pose estimation, and more;

  • features2d, which provides feature detectors, descriptors, and descriptor matches;

  • objdetect, which provides detection of objects and instances of predefined classes;

Written in C++, OpenCV is available with interfaces for C++, Python, Java, and Matlab and supports Windows, Linux, Android and Mac OS. Along with its support for single instruction, multiple data (SIMD) instruction sets, OpenCV provides CUDA-based GPU acceleration for many functions through its gpu module and OpenCL acceleration through its ocl module. Recently released, OpenCV 4.0 brings a number of performance improvements and capabilities including implementation of the popular Kinect Fusion algorithm.

For its functionality, OpenCV can require a learning curve that exceeds the patience of developers looking to move quickly with robot vision. For these developers, Python-based SimpleCV [source] might be the answer. Built on OpenCV, SimpleCV provides the functionality required by advanced robot-vision developers but provides an accessible framework that helps less experienced developers explore basic machine vision functions with simple Python function calls. For example, developers can quickly implement commonly used functions such as image thresholding using a simple built-in method in the SimpleCV Image class (img.binarize() in listing below) and finally displaying the results shown in Figure 1.

from SimpleCV import Image, Color, Display # Make a function that does a half and half image.def halfsies(left,right):    result = left    # crop the right image to be just the right side.    crop   = right.crop(right.width/2.0,0,right.width/2.0,right.height)    # now paste the crop on the left image.    result = result.blit(crop,(left.width/2,0))    # return the results.    return result# Load an image from imgur.img = Image('http://i.imgur.com/lfAeZ4n.png')# binarize the image using a threshold of 90# and invert the results.output = img.binarize(90).invert()# create the side by side image.result = halfsies(img,output)# show the resulting image.result.show()# save the results to a file.result.save('juniperbinary.png')


Figure 1. Results of Python code listed above (Source: SimpleCV)

Along with their basic image processing functions, OpenCV and SimpleCV implement a number of high-level image processing algorithms that robotic systems need to work with objects or operate safely within their physical environment. One of the fundamental data structures used in many of these computations is the point cloud – a collection of multi-dimensional data points that represent an object (Figure 2). Acquired from cameras, the point cloud of an object is used for fundamental robotic operations such as object identification, alignment, and fitting. For working with point clouds, the Point Cloud Library (PCL) [source] implements algorithms for filtering, fitting, keypoint extraction, segmentation, and much more.


Figure 2. Point cloud data set for a basic torus. (Source: Wikimedia Commons/Kieff).

For working with the physical environment, developers can turn to individual techniques such as visual odometry implemented in the C++ library Libviso2 [source], which can determine position and orientation in milliseconds from stereo cameras, or the Visual Servoing Platform (ViSP) [source], which builds on multiple modules to implement vision-based methods for real-time robot-motion control (Figure 3).

click for larger image

Figure 3. Modular architecture of Visual Servoing Platform (ViSP). (Source: ViSP)

Other methods combine multiple images, motion sensors, and other information in a process called simultaneous localization and mapping (SLAM). Systems such as Cartographer [source] implement the complex series of steps involved in SLAM processing, transforming data from cameras and inertial measurement sensors into position maps (Figure 4). Using these results, a robot can navigate a complex environment, building a map of its surroundings as it proceeds (Figure 5).

click for larger image

Figure 4. Cartographer draws on multiple modules to implement -time simultaneous localization and mapping (SLAM). (Source: Cartographer)


Figure 5. Cartographer enables robots to map their surroundings. (Source: Cartographer)

Developers can find open-source libraries that combine these specialized methods with a broad set of capabilities needed to address a wide range of robot vision requirements. The Mobile Robot Programming Toolkit (MRPT) [source] provides C++ libraries for SLAM, multidimensional geometry calculations, obstacle avoidance, camera calibration, and more. Another library, the NASA Vision Workbench [source], lets developers take advantage of the image-processing expertise of engineers and scientists in the Autonomous Systems and Robotics area at NASA Ames Research Center.

Developers looking for more fundamental numerical libraries can find multiple open-source offerings. At the most fundamental level, the widely used C++ Boost library includes the Generic Image Library (GIL) [source] module for image processing. At a somewhat higher level, the VXL suite [source] comprises C++ core libraries including vil (image manipulation), vgl (geometry calculations), vnl (numeric processing of matrices and vectors), and others.

Although C++ is the predominant language used in these various open-source libraries, developers working in other languages can find bindings or wrappers supporting their favorite languages and even native libraries. For example, BoofCV [source] is an open-source Java library that provides functionality ranging from low-level image processing, camera calibration, feature tracking, and object recognition. For Rust developers, the nalgebra library [source] supports linear algebra algorithms as well as specialized Rust libraries such as ncollide [source] for collision detection, pathfinding [source] for navigation, and more. Besides SimpleCV mentioned earlier, Python developers can find additional libraries. Implemented in C++ for speed, the Mahotas Python library [source] leverages numpy array processing to provide algorithms for thresholding, convolution, edge detection, segmentation, and more. The PythonRobotics library [source] implements algorithms for higher level vision-based functionality such as localization, mapping, collision avoidance, path planning, and path following.

Finally, just as exposure to OpenCV is vital for any developer working on robot vision, recognition of another open-source package is required for any developer working on robot system software. The Robot Operating System (ROS) is a framework designed to simplify integration of open-source libraries and tools needed to create specific robot software applications working in programming languages including C++, Python, Java, Rust, and others. Although ROS is not an operating system (much less an RTOS), it serves as a communications and management middleware on Linux platforms including Ubuntu, Fedora, Gentoo, and others.

The ROS framework has nurtured a rich ecosystem comprising over 2,000 libraries all related to robot system design. In fact, developers can find ROS package for many of the software packages mentioned earlier, including OpenCV, PCL, Cartographer, and MRPT.

Developers can find many worthy open-source libraries beyond the few mentioned here. Besides exploring web search results, developers can find hundreds of open-source robot vision offerings on github and other open-source repositories. For advanced users, robot vision continues to be an area of active research and many research papers on arXiv.org include links to related open-source software. In addition, research groups in academia, industry, and government often provide access to archival software that may suit specialized needs. Whatever the provider, open-source software provides a hugely rich set of libraries for every aspect of robot vision.

Check out the other articles in this Vision-Guided Robotics Special Project:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.