Applying machine learning in embedded systems
Machine learning has evolved rapidly from an interesting research topic to an effective solution for a wide range of applications. Its apparent effectiveness has rapidly accelerated interest from a growing developer base well outside the community of AI theoreticians. In some respects, machine learning development capabilities are evolving to a level of broad availability seen with other technologies that build on strong theoretical foundations. Developing a useful, high-accuracy machine-learning application is by no means simple. Still, a growing machine-learning ecosystem has dramatically reduced the need for a deep understanding of the underlying algorithms and made machine-learning development increasing accessible to embedded systems developers more interested in solutions than theory. This article attempts to highlight just some of the key concepts and methods used in neural network model development – itself an incredibly diverse field and just one type of practical machine-learning methods becoming available to embedded developers.
As with machine learning, any method based on deep theory follows a familiar pattern of migration from research to engineering. Not too long ago, developers looking to achieve precise control of a three-phase AC induction motor needed to work through their own solutions to the associated series of differential equations. Today, developers can rapidly implement advanced motion-control systems using libraries that package complete motor-control solutions using very advanced techniques like field-oriented control, space vector modulation, trapezoidal control, and more. Unless they face special requirements, developers can deploy sophisticated motor-control solutions without a deep understanding of the underlying algorithms or their specific math methods. Motion-control researchers continue to evolve this discipline with new theoretical techniques, but developers can developing useful applications, relying on libraries to abstract the underlying methods.
In some respects, machine learning has reached a similar stage. While machine-learning algorithm research and machine-learning-specific hardware advances continue to achieve dramatic advances, application of these algorithms has evolved to become a practical engineering method if approached with a suitable understanding of its associated requirements and current limitations. In this context, machine learning can deliver useful results, requiring less expertise in advanced linear algebra than an appreciable understanding of the target application data – and a willingness to accept a more experimental approach to development than they have experienced in conventional software development. Engineers interested in the foundations of machine learning will find their appetite for details fully satisfied. Yet, those with little time or interest in exploring theory will find a growing machine-learning ecosystem that promises to simplify development of useful machine-learning applications.
Engineers can find optimized libraries able to support broad classes of machine learning including unsupervised learning, reinforcement learning, and supervised learning. Unsupervised learning can reveal patterns in large amounts of data, but this method cannot specifically label those patterns as belonging to particular class of data. Although this article does not address unsupervised learning, these techniques will likely prove important in applications such as the IoT to reveal outliers in data sets or indicate the existence of departures from data trends. In an industrial application, for example, a statistically significant departure from the norm in sensor readings from a group of machines might serve as an indicator of potential failure of machines in that group. Similarly, a significant departure from a number of measured performance parameters in a large-scale distributed application such as an IoT application might reveal hacked devices that seem to be otherwise operating satisfactorily in a network of hundreds or thousands of devices.
Reinforcement learning provides a method for an application to effectively learn by experiment, using positive feedback (reward) to learn successful responses to events. For example, a reinforcement learning system that detects anomalous sensor readings from a group of machines might try to return those readings to normal by taking different actions such as increasing coolant flow, reducing room temperature, reducing machine load, and the like. Having learned which action resulted in success, the system could more quickly perform that same action the next time the system sees those same anomalous readings. Although this article does not address this method, reinforcement learning will likely find growing use in large-scale complex applications (such as the IoT) where all realized operating states cannot be cost-effectively anticipated.
Supervised learning methods eliminate the guesswork associated with identifying what set of inputs correspond to which specific state (or object). In this approach, developers explicitly identify combinations of input values, or features, that correspond to a particular object, state, or condition. In the hypothetical machine example, engineers would represent the problem of interest through a set of n features, x – for example, different sensor inputs, machine running time, last service date, machine age, and other measurable values. Based on their expertise, the engineers then create a training data set – multiple instances of these feature vectors (x1 x2 … xn), each with n observations associated with the known output state, or label y:
(x11, x12, … x1n) ⇒ y1
(x21, x22, … x2n) ⇒ y2
(x31, x32, … x3n) ⇒ y3
Given this training set with known relationship between measured feature values and corresponding labels, developers train a model (a system of equations) able to produce the expected label yk for each feature vector (x1k x2k … xnk) in the training set. During this training process, the training algorithm uses an iterative approach to minimize the difference between predicted labels and their actual labels by adjusting the parameters of the system of equations that make up the model. Each pass through the training set, called an epoch, produces a new set of parameters, a new set of predicted labels associated with those parameters, and the associated difference, or loss.
Plotted against the loss, the set of parameter values produced at each iteration is a multidimensional surface with some minimum. This minimum corresponds to the closest agreement between actual labels provided in the training set and predicted label inferred by the model. Thus, the objective in training is to adjust a model's internal parameters to reach that minimum loss value, using methods designed basically to seek the fastest "downhill" path toward this minimum. On a multidimensional surface, the direction that leads to that best downhill path can be determined by calculating the slope at each parameter with respect to the other parameters – that is, each parameter's partial derivative. Using matrix methods, training algorithms typically use this approach, called gradient descent, to adjust model parameter values after running all the training data or subsets of training data through the model at each epoch. To minimize the magnitude of this adjustment, training algorithms adjust each step size by some value, called the learning rate, which helps the training process converge. Without a controlled learning rate, gradient descent could overshoot the minimum due to an excessively large adjustment of the model parameters. After the model achieves (or acceptably converges toward) the minimum loss, engineers then test the model's ability to predict labels associated with data sets specifically held out of the training set for testing purposes.
Once trained and evaluated, a suitable model can be deployed in production as an inference model to predict labels for actual application data. Note that inference generates a set of probabilities for each label used in training. Thus, a model trained with feature vectors labeled as "y1," "y2," or "y3" might generate inference results such as "y1: 0.8; y2: .19; y3: .01" when presented with a feature vector associated with y1. Additional software logic would monitor the output layer to select the label with the best likelihood value and pass that selected label to the application. In this way, an application can use a machine-learning model to recognize an individual or other data pattern and take appropriate action.