Bringing machine learning to the edge: A Q&A with Neurala’s Anatoli Gorshechnikov

Applying machine learning (ML) in embedded systems depends on algorithms and hardware able to perform inference — and even training at acceptable rates. With its “Lifelong-Deep Neural Networks” (L-DNN), Neurala has sharply reduced training time, training-set size, and memory requirements with relatively minimal impact on inference time, mean average precision (mAP), or mean image average precision (mIAP) metrics (Figure 1).

click for larger image

(Source: Neurala)

Building on years of development including successful SBIR phase 1 and demanding phase 2 contracts from NASA earlier this decade, the company's technology today finds application in a range of applications such as AI-powered inspection drones and others built on resource-constrained platforms. Leading Neurala's technology development, Neurala co-founder and CTO Anatoli Gorshechnikov brings to bear more than 15 years of experience in development of massively parallel software for neural computation. Embedded had an opportunity to ask Gorshechnikov about the status and prospects for machine learning in embedded systems.

Embedded:
Embedded developers typically work with resource-constrained, power-limited devices that might be running ARM Cortex-M-type processors, for example. The notion of running ARM Cortex-A-type processors in IoT edge devices offers more performance but these are still general-purpose CPUs. Broadly speaking, what would you consider the top barriers to realization of machine learning at the edge or even in IoT terminal devices?

Gorshechnikov:
First of all, my experience lies primarily with neural network-based ML, so I would like to limit the discussion to this subdomain of ML.I understand the itch to put ML on terminal devices and we hear this desire from our customers on a regular basis, but when you are talking neural networks you need to forget about M-type processors. Even modern neural architectures designed to operate on the low-profile devices include a few millions of weights and millions of FLOPS. You are not really looking for a security camera that reports to you that there was an intruder in your backyard two hours after the event. A-types are more permissive in this respect, especially as manufacturers develop and make available special purpose neural network acceleration libraries. There is only so much we can do on the software side, so we are looking with hope at the manufacturers to provide dedicated neural processors, accelerated neural libraries and other support today, same as we were looking at GPU manufacturers 10 years ago.

Embedded:
What does an organization looking to deploy applications with ML at the edge need to consider?

Gorshechnikov:
From the end customer point of view:

The pros for doing inference at the edge are:

  • Predictions are much lower bandwidth than the raw sensor data (e.g. video)

  • It allows for local adaptation in the AI logic (L-DNN)

  • It achieves lower latency between observed event and action resulting from AI logic

The cons include:

  • The need to disperse computational power to the edge, i.e. more cost and complexity in the widest part of the hierarchy

  • Performing updates to the AI logic are more cumbersome to distribute

The rapid cost/compute power advances driven by the smartphone explosion has shifted the balance towards favoring complex AI at the edge.

For the solution provider the cost benefit analysis always helps. How long will it take to optimize the solution for a particular device? How many of those devices will actually make into deployment? How long they will stay deployed? How useful optimizations for this device will be for other devices? These types of questions need to be answered before engagement.

Embedded:
What does an engineer looking to implement ML at the edge need to consider?

Gorshechnikov:
There are many more instances of the edge than there are of the core (several orders of magnitude). So just fundamentally any fault is going to have an exponential effect (and repair cost). Also, even small savings in compute time and hence power consumption will have great aggregate effect. So, the most important question is what is the least amount accuracy and computation complexity we can do while still delivering the business value?

On a more practical side there are many questions to consider. How close are we with the manufacturer? Will we get pre-release samples to work on? How stable is the architecture? How good is the documentation? In some cases, even what language is used in documentation and can we read it? Is there an ML-specific API? How stable is it? How close is this device to devices we worked on in the past? Basically, all the questions that will help to determine how long it will take to implement the solution.

Finally, can the AI tasks be distributed between core and edge rather than putting all tasks on one side of the connection?

Embedded:
What are the top mistakes likely to be made in implementing ML at the edge?

Gorshechnikov:
Continuing with the top down approach ‘let’s make it perform the task first and then squeeze it on device` instead of switching to bottom up ‘let’s make it run on device and fulfill all hardware constraints first, and then tune it for the task at hand’.

Embedded:
What's are the primary limitations of current approaches for implementing ML at the edge.

Gorshechnikov:
Again, within traditional neural network-based ML, there is no ML on the edge. There was machine inference on the edge, but no learning, it was just too computationally expensive to train convolutional networks on the edge hardware. As a result, edge based “intelligence” couldn’t adapt to the changes in the environment. Can we even call that intelligence? This was the motivation that led us to developing our L-DNN technology.

Embedded:
Developers compensate for changes in data trends due to aging, drift, and the like, by building in appropriate compensation and using self-calibration methods in their designs. In fact, many sensors build in both compensation and self-calibration to address long-term drift. So why is on-device retraining so important in IoT devices?

Gorshechnikov:
Can a sensor in a security camera calibrate itself so that it recognizes your child as a legitimate person on the premises while your child grows from toddler into school age into teenager? Do you need to buy a new camera when you get married just to add your spouse to the list of residents in your house? Can you fix the world around us so that it does not change beyond the sensor drift? The world around us is constantly changing, so we need our technology to be able to evolve with us.

Embedded:
Once a model is deployed, if an organization identifies a new need with different feature selection, won’t an entirely new model be required. So does have the capability to perform on-device training help in this case?

Gorshechnikov:
If you use the word ‘feature’ the same way that I do, then think about it this way: features are the properties of the input. For all inputs of the same modality, the features are the same. For images you have edges, colors, textures, and so forth on the low level, and the combinations of these on the high level. Now, as you have said, an organization needs a different feature selection. They use the same features, since input modality has not changed, just different combinations became important. Basically, if your system is modular and you can separate feature extraction from the learning module, all you need to satisfy this change is to retrain your learning module. So yes, adding ability to do this on device leads to your customer being able to adapt the system to their new requirements without reengaging you. Plus, and it is quite beneficial in many cases, there is no need to move data from the edge to the server or even follow identical training sequences for different edges, devices for warehouses in Alaska can be trained differently from those in Texas.

Embedded:
What's different about Neurala's approach?

Gorshechnikov:
Our system is exactly as I described above at its core: modular, with feature extraction part factory pretrained for particular input modality and often hardware, and a fast learning part that can learn combinations of these features as definitions of objects or events after deployment.

Embedded:
Neurala describes its approach in reference to actual neurophysiological cortical and subcortical structures and acetylcholine (ACh). What's the motivation behind that and what are the corresponding elements in your architecture?

Gorshechnikov:
According to popular opinion, the human brain is the only intelligent system on this planet. Naturally, it serves as an inspiration for all our work. Consider the development and functionality of the brain: sensory processing in cortical structures is formed early in life. This is our feature extraction module. Although there are individual differences between us at this stage, they are minor. We all have cells that respond to edges, colors, motion, and so forth. Thus, this part can be factory pretrained without loss of functionality. Hippocampus is only fully formed in humans by the age of three. Its main function is to create episodic memories by linking already known features together. That is what makes each human unique: different life experiences. And this is our fast learning module, the system only needs it after features are formed, and most of the learning in it happens after deployment. The different levels of ACh signal in humans whether we are awake (and have new sensory input) or asleep (sensory input is shut off). Naturally, you can implement it as a simple binary flag in the ML system. Having it allows our system to adjust the feature extraction module in the slow learning regime based on the information stored in the fast learning module, although this is better done on the server rather than on the edge. You can take the brain paradigm further and realize that our system as it is is missing prefrontal cortex, an intelligent decision maker that is fully formed in humans by the time we reach early twenties.

Embedded:
In your keynote talk at ESC in Boston earlier this year, you used these cortical structures to describe additional processing Neurala adds behind conventional YOLO (you only look once) layers in your fast learning module (see image below). After the YOLO layers, how is your approach completing the process?

click for larger image

(Source: Neurala)

Gorshechnikov:
YOLO answers two questions: where in the image the object is and what it is. For now our system only takes over the answer to the second question. How does YOLO answer this question? By analyzing the features of a particular region of the image where the object is and comparing them to the features of pretrained objects. What does our system add? An ability to learn on the fly the new combinations of features as new objects. There is very minimal effect on the data flow and transformations.

Embedded:
What’s the sweet spot in terms of applications for this approach?

Gorshechnikov:
I believe any time you prefer the human-like ability to adapt (through some more mistakes, naturally) to more perfect but fixed system, that is our sweet spot. For example, we have a customer whose environment changes every three weeks, but the server-side update of the system (without on the edge learning) takes two weeks. Our solution fits right in.

Embedded:
For what sort of application is this approach less suited?

Gorshechnikov:
Factory quality control is the first that comes to mind. Error rates required there are usually lower than neural networks can provide in general, and the environment only changes when new parts are introduced to the process, which is rare and can be accommodated for using conventional server update.

Embedded:
From the developer’s point of view, what’s different about the process for implementing this approach, specifically in the model development flow, deployment, and retraining?

Gorshechnikov:
Not all DNNs are equal in extracting high quality features, which are crucial for L-DNN to succeed. During development we pay special attention to ensure that features are consistent and separable for different kinds of objects. Unfortunately, there is almost always a trade-off between the quality of features and the model size and speed, so we have to balance on the edge to be deployed on the edge. As for retraining, the usual neural network criteria apply: if your feature extractor was trained in bright light, there will be little success in retraining our system in the dark and so forth.

Embedded:
Is there a path to your system from other frameworks or libraries? To what extent can a developer leverage what they’ve developed with another framework or library in using your system?

Gorshechnikov:
Indeed there is. Our SDK allows using neural networks trained with Caffe and TensorFlow, so whatever network you prefer for feature extraction can serve as a basis for our L-DNN system. The fast learning module though you would need to add using our SDK directly, this network is not supported on the widely available frameworks.

Embedded:
Regarding the significantly smaller training sets required with this approach, to what extent does the engineer have to do more work in refining or doing feature engineering on an optimal training set than they might with another technique?

Gorshechnikov:
We add a few extra parameters to tweak, but given that the training cycle is very fast, you can do a whole parameter sweep in less than a day with a script.

Embedded:
Also regarding the smaller training set, to what extent does that limit the model’s ability to generalize, or is this approach best suited to highly specialized problem sets?

Gorshechnikov:
The ability to generalize for L-DNN is mostly inherited from the underlying feature extractor module. If the features for all instances of a certain objects are nicely clustered together into one region of a feature space, then a single image for this object is enough to generalize to all other images.

Embedded:
What sort of on-device training and inference performance can one expect for a reasonable range of models across various hardware platforms?

Gorshechnikov:
The inference and training time for L-DNN tend to be near identical with each other, regardless of the underlying hardware. This execution time and power consumption is the sum total of performance of the feature extractor network, and the performance of the fast learning module. As previously mentioned, the former can run on a variety of frameworks, and some of these frameworks have dedicated hardware acceleration available, whereas we have CPU and GPU implementations of the fast learning module. For the i7 CPU the learning and inference times are around 95ms, nVidia GTX 1060 7ms, nVidia TX1 GPU 28ms, Samsung Galaxy 8+ using GPU inference 130ms, training 145ms, all these numbers collected using modified greynet as a feature extractor. On a specialized neural processor used for feature extraction (modified squeezenet) combined with CPU for fast learning module, the inference time is 20ms.

Embedded:
What’s next with this approach in terms of both remaining issues and opportunities? 

Gorshechnikov:
More flexibility and precision in answering the “where” question. That includes both bounding boxes and instance segmentation. And in the long run, remember, we are still missing the “prefrontal cortex” in our system, so it is a teenager at best – so, no driving without adult supervision, no voting rights, and no jury duty until we develop it further.

1 thought on “Bringing machine learning to the edge: A Q&A with Neurala’s Anatoli Gorshechnikov

  1. “I am loving that analogy about whether the technology change with the people in the home. I have actually thought about this situation before and I realise that it has never really come up because we have to change out and upgrade our existing systems oft

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.