Bringing machine learning to the edge: A Q&A with Neurala's Anatoli Gorshechnikov

July 10, 2018

Stephen.Evanczuk-July 10, 2018

Applying machine learning (ML) in embedded systems depends on algorithms and hardware able to perform inference — and even training at acceptable rates. With its "Lifelong-Deep Neural Networks" (L-DNN), Neurala has sharply reduced training time, training-set size, and memory requirements with relatively minimal impact on inference time, mean average precision (mAP), or mean image average precision (mIAP) metrics (Figure 1).

click for larger image

(Source: Neurala)

Building on years of development including successful SBIR phase 1 and demanding phase 2 contracts from NASA earlier this decade, the company's technology today finds application in a range of applications such as AI-powered inspection drones and others built on resource-constrained platforms. Leading Neurala's technology development, Neurala co-founder and CTO Anatoli Gorshechnikov brings to bear more than 15 years of experience in development of massively parallel software for neural computation. Embedded had an opportunity to ask Gorshechnikov about the status and prospects for machine learning in embedded systems.

Embedded developers typically work with resource-constrained, power-limited devices that might be running ARM Cortex-M-type processors, for example. The notion of running ARM Cortex-A-type processors in IoT edge devices offers more performance but these are still general-purpose CPUs. Broadly speaking, what would you consider the top barriers to realization of machine learning at the edge or even in IoT terminal devices?

First of all, my experience lies primarily with neural network-based ML, so I would like to limit the discussion to this subdomain of ML.I understand the itch to put ML on terminal devices and we hear this desire from our customers on a regular basis, but when you are talking neural networks you need to forget about M-type processors. Even modern neural architectures designed to operate on the low-profile devices include a few millions of weights and millions of FLOPS. You are not really looking for a security camera that reports to you that there was an intruder in your backyard two hours after the event. A-types are more permissive in this respect, especially as manufacturers develop and make available special purpose neural network acceleration libraries. There is only so much we can do on the software side, so we are looking with hope at the manufacturers to provide dedicated neural processors, accelerated neural libraries and other support today, same as we were looking at GPU manufacturers 10 years ago.

What does an organization looking to deploy applications with ML at the edge need to consider?

From the end customer point of view:

The pros for doing inference at the edge are:

  • Predictions are much lower bandwidth than the raw sensor data (e.g. video)

  • It allows for local adaptation in the AI logic (L-DNN)

  • It achieves lower latency between observed event and action resulting from AI logic

The cons include:

  • The need to disperse computational power to the edge, i.e. more cost and complexity in the widest part of the hierarchy

  • Performing updates to the AI logic are more cumbersome to distribute

The rapid cost/compute power advances driven by the smartphone explosion has shifted the balance towards favoring complex AI at the edge.

For the solution provider the cost benefit analysis always helps. How long will it take to optimize the solution for a particular device? How many of those devices will actually make into deployment? How long they will stay deployed? How useful optimizations for this device will be for other devices? These types of questions need to be answered before engagement.

What does an engineer looking to implement ML at the edge need to consider?

There are many more instances of the edge than there are of the core (several orders of magnitude). So just fundamentally any fault is going to have an exponential effect (and repair cost). Also, even small savings in compute time and hence power consumption will have great aggregate effect. So, the most important question is what is the least amount accuracy and computation complexity we can do while still delivering the business value?

On a more practical side there are many questions to consider. How close are we with the manufacturer? Will we get pre-release samples to work on? How stable is the architecture? How good is the documentation? In some cases, even what language is used in documentation and can we read it? Is there an ML-specific API? How stable is it? How close is this device to devices we worked on in the past? Basically, all the questions that will help to determine how long it will take to implement the solution.

Finally, can the AI tasks be distributed between core and edge rather than putting all tasks on one side of the connection?

What are the top mistakes likely to be made in implementing ML at the edge?

Continuing with the top down approach ‘let’s make it perform the task first and then squeeze it on device` instead of switching to bottom up ‘let’s make it run on device and fulfill all hardware constraints first, and then tune it for the task at hand’.

What's are the primary limitations of current approaches for implementing ML at the edge.

Again, within traditional neural network-based ML, there is no ML on the edge. There was machine inference on the edge, but no learning, it was just too computationally expensive to train convolutional networks on the edge hardware. As a result, edge based “intelligence” couldn’t adapt to the changes in the environment. Can we even call that intelligence? This was the motivation that led us to developing our L-DNN technology.

Developers compensate for changes in data trends due to aging, drift, and the like, by building in appropriate compensation and using self-calibration methods in their designs. In fact, many sensors build in both compensation and self-calibration to address long-term drift. So why is on-device retraining so important in IoT devices?

Can a sensor in a security camera calibrate itself so that it recognizes your child as a legitimate person on the premises while your child grows from toddler into school age into teenager? Do you need to buy a new camera when you get married just to add your spouse to the list of residents in your house? Can you fix the world around us so that it does not change beyond the sensor drift? The world around us is constantly changing, so we need our technology to be able to evolve with us.

Once a model is deployed, if an organization identifies a new need with different feature selection, won’t an entirely new model be required. So does have the capability to perform on-device training help in this case?

If you use the word ‘feature’ the same way that I do, then think about it this way: features are the properties of the input. For all inputs of the same modality, the features are the same. For images you have edges, colors, textures, and so forth on the low level, and the combinations of these on the high level. Now, as you have said, an organization needs a different feature selection. They use the same features, since input modality has not changed, just different combinations became important. Basically, if your system is modular and you can separate feature extraction from the learning module, all you need to satisfy this change is to retrain your learning module. So yes, adding ability to do this on device leads to your customer being able to adapt the system to their new requirements without reengaging you. Plus, and it is quite beneficial in many cases, there is no need to move data from the edge to the server or even follow identical training sequences for different edges, devices for warehouses in Alaska can be trained differently from those in Texas.

Continue reading on page two >>


< Previous
Page 1 of 2
Next >

Loading comments...