Applying machine learning in embedded systems
Neural network development
Creating accurate inference models is of course the payoff to a supervised learning process able to draw on a wide range of underlying model types and architectures. Among these model types, neural networks have rapidly gained popularity for their success in image recognition, natural language processing, and other application areas. In fact, after advanced neural networks dramatically outperformed earlier algorithms in image recognition, neural network architectures have become the de facto solution for these classes of problems. With the availability of hardware including GPUs able to perform the underlying calculations quickly, these techniques rapidly became broadly accessible to algorithm developers and users. In turn, the availability of effective hardware platforms and widespread acceptance of neural networks have motivated development of a wide range of developer-friendly frameworks including Facebook's Caffe2, H2O, Intel's neon, MATLAB, Microsoft Cognitive Toolkit, Apache MXNet, Samsung Veles, TensorFlow, Theano, and PyTorch. As a result, developers can easily find a suitable environment for evaluating machine learning in general and neural networks in particular.
Development of neural networks starts with deployment of a framework using any number of available installation options. Although dependencies are typically minimal, all of the popular frameworks are able to take advantage of GPU-accelerated libraries. Consequently, developers can dramatically speed calculations by installing the NVIDA CUDA toolkit and one or more libraries from the NVIDIA Deep Learning SDK such as NCCL (NVIDIA Collective Communications Library) for multi-node/multi-GPU platforms or NVIDIA cuDNN (CUDA Deep Neural Network) library. Operating in their GPU-accelerated mode, machine-learning frameworks take advantage of cuDNN's optimized implementations for standard neural-network routines including convolutions, pooling, normalization, and activation layers.
Whether using GPUs or not, the installation of a framework is simple enough, typically requiring a pip install for these Python-based packages. Installing TensorFlow, for example, uses the same Python install method as with any Python module:
pip3 install --upgrade tensorflow
(or just pip for Python 2.7 environments)
In addition, developers may want to add other Python modules to speed different aspects of development. For example, the Python pandas module provides a powerful tool for creating needed data formats, performing different data transformations, or just handling the various data wrangling operations often required in machine-learning model development.
Experienced Python developers will typically create a virtual environment for Python development, and the popular frameworks are each available through Anaconda, for example. Developers using container technology to simplify devops can also find suitable containers built with their framework of choice. For example, TensorFlow is available in Docker containers on dockerhub in CPU-only and GPU-supported versions. Some frameworks are also available in Python wheel archives. For example, Microsoft provides Linux CNTK wheel files in both CPU and GPU versions, and developers can find wheel files for installing TensorFlow on a Raspberry Pi 3.
While setting up a machine-learning framework has become simple, the real work begins with selection and preparation of the data. As described earlier, data plays a central role in model training – and thus in the effectiveness of an inference model. Not mentioned earlier is that fact that training sets have typically comprised hundreds of thousands if not millions of feature vectors and labels to achieve sufficient accuracy levels. The massive size of these data sets make casual inspection of input data either impossible or largely ineffective. Yet, poor training data translates directly to reduced model quality. Incorrectly labeled feature vectors, missing data, and, paradoxically, data sets that are "too" clean can result in inference models unable to deliver accurate predictions or generalize well. Perhaps worse for the overall application, selection of a statistically non-representative training set implicitly biases the model away from those missing feature vectors and the entities they represent. Because of the critical role of training data and the difficulty in creating it, the industry has evolved large numbers of labeled data sets available from sources such as the UCI Machine Learning Repository, among others. For developers simply exploring different machine-learning algorithms, Kaggle datasets often provide a useful starting point.
For a development organization working on its unique machine-learning application, of course, model development requires its own unique data set. Even with a sufficiently large pool of available data, the need to label the data can introduce difficulties. In practice, the process of labeling data is by definition a human-centric activity. As a result, creating a system for accurately labeling data is a process in itself, requiring a combination of psychological understanding of how humans interpret instructions (such as how and what to label) and technological support to speed data presentation, labeling, and validation. Companies such as Edgecase, Figure Eight, and Gengo combine expertise in the broad requirements of data labeling, providing services designed to turn data into useful training sets for supervised learning. With a qualified set of labeled data in hand, developers then need to split the data into a training set and a test set – typically using a 90:10 split or so – taking care that the test set is a representative but distinct set of data from that in the training set.
In many ways, creating suitable training and test data can be more difficult than creating the actual model itself. With TensorFlow, for example, developers can build a model using built-in model types in TensorFlow's Estimator class. For example, a single call such as:
classifier = tf.estimator.DNNClassifier( feature_columns=this_feature_column, hidden_units=[4,], n_classes=2)
uses the built-in DNNClassifier class to automatically create a basic fully connected neural network model (Figure 1) comprising an input layer with three neurons (the number of supported features), one hidden layer with four neurons, and an output layer with two neurons (the number of supported labels). Within each neuron, a relatively simple activation function performs some transformation on its combination of inputs to generate its output.
Figure 1. Although the simplest neural network comprises an input layer, hidden layer, and output layer, useful inference relies on deep neural network models comprising large numbers of hidden layers each comprising large numbers of neurons. (Source: Wikipedia)
To train the model, the developer would simply call the train method in the instantiated estimator object – classifier.train(input_fn=this_input_function) in this example – and using the TensorFlow Dataset API to provide properly formed data through the input function (this_input_function in this example). Such preprocessing, or "shaping," is needed to convert input data streams to matrices with dimensions (shapes) expected by the input layers, but this preprocessing step can also include data scaling, normalization, and any number of transformations required for a particular model.
Neural networks lie at the heart of many advanced recognition systems, but practical applications are based on neural networks with significantly more complex architectures than this example. These "deep neural network" architectures feature many hidden layers, each with large numbers of neurons. Although developers can simply use the built-in Estimator classes to add more layers with more neurons, successful model architectures tend to mix different types of layers and capabilities.
For example, AlexNet, the convolutional neural network (CNN), or ConvNet, that ignited use of CNNs in the ImageNet competition (and in many image recognition applications since then) had eight layers (Figure 2). Each layer comprised a very large number of neurons, staring with 253440 in the first layer and continuing with 186624, 64896, 64896, 43264, 4096, 4096, and 1000. Rather than work with a feature vector of observed data, ConvNets scan an image through a window (n x n pixel filter), moving the window a few pixels (stride) and repeating the process until the image has been fully scanned. Each filter result passes through the various layers of the ConvNet to complete the image-recognition model.
Figure 2. AlexNet demonstrated the use of deep convolutional neural network architectures in reducing error rates in image recognition. (Source: ImageNet Large Scale Visual Recognition Competition)
Even with that "simple" configuration, use of a CNN provided a dramatic decrease in top-5 error in the ImageNet Large Scale Visual Recognition Competition (ILSVRC) compared to the leading solution just the year before. (Top-5 error is a common metric that indicates the percentage of inferences that did not include the correct label among the model's top five predictions for possible labels for that input data.) In subsequent years, leading entries featured a dramatic increase in the number of layers and equally dramatic reduction in top-5 error (Figure 3).
Figure 3. Since AlexNet dramatically reduced ImageNet top-5 error rates in 2012, the top performers in the ILSVRC featured significantly deeper model architectures. (Source: The Computer Vision Foundation)
Developers can use any of the popular frameworks to create ConvNets and other complex, custom models. With TensorFlow, the developer builds a ConvNet model layer by layer using methods calls to build a convolution layer, aggregating results with a pooling layer, normalizing the result – and typically repeating that combination to create as many convolution layers as needed. In fact, in a TensorFlow demo of a ConvNet designed to complete the CIFAR-10 classification set, those first three layers are built using three key methods: tf.nn.conv2d, tf.nn.max_pool, and tf.nn.lrn:
# conv1 with tf.variable_scope('conv1') as scope: kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64], stddev=5e-2, wd=None) conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME') biases = _variable_on_cpu('biases', , tf.constant_initializer(0.0)) pre_activation = tf.nn.bias_add(conv, biases) conv1 = tf.nn.relu(pre_activation, name=scope.name) _activation_summary(conv1) # pool1 pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool1') # norm1 norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm1')