How to deliver ultra-low power ML for more effective embedded vision - Embedded.com

How to deliver ultra-low power ML for more effective embedded vision

New advances in ML modeling and processing hold the key to widespread adoption of smart cameras.

Machine learning algorithms have opened up a realm of possibilities to enable vision embedded in products that make our home, workspaces and the places in between safer and more efficient. To truly realize the potential of smart vision in more use cases, developers need more power-efficient, and more flexible embedded solutions that can operate on batteries, be easy to install and maintain, and still deliver the vision performance required to provide effective and intelligent sensing of the things we want to detect and monitor. New advances in ML modeling and processing hold the key to widespread adoption of smart cameras.

Affordable remote visual monitoring used to mean an infrared motion detector: inexpensive, autonomous, but not necessarily effective. A friend of mine recently protected his back yard with a set of internet-connected video cameras. The cameras used infrared motion detection to wake up, and then would send video to an app.

The trouble was, motion detection in his back yard meant detecting everything from neighborhood squirrels to wind chimes blowing in the breeze. He was so overloaded with video clips that he understandably took to ignoring them—including, one day, the one of burglars breaking into his house through the back yard.

Until recently, the alternative would be to feed the video to a control room where, you could hope, a human observer would notice on the screen if something important happened. That approach offers much more protection, but at much greater expense and energy consumption. We really need something in between: inexpensive, battery-powered, but more discriminating than simple motion detection. Ideally, the system would have enough embedded intelligence to first qualify an event as a “true event” before waking up the more powerful camera that records and streams high resolution video after sending a notification to an app.

Today, huge advances in machine learning technology have greatly refined unattended video analysis. Now a high-definition camera with a high-performance deep-learning inference accelerator chip—or a broadband connection to a cloud data center—can significantly augment and improve traditional security and monitoring methods. In fact, such deep-learning systems have demonstrated remarkable abilities: facial recognition, gesture interpretation—for example, to detect shoplifting—or even mood estimation—to detect even the temptation to shoplift. Maybe these abilities have become even a little too remarkable.

But these systems are still expensive. They require external power and a broadband connection. And because they are so capable, they raise issues of security and privacy that may limit their deployment or raise regulatory hurdles.

What about the other end of the scale, back towards that humble infrared motion sensor? There are still many applications where just detecting the presence of a person—without identifying them or estimating their psychological profile—is sufficient. Many of these applications need autonomy from external power sources and can offer only limited connectivity back into the network. And many require very low cost. What about them?

Significant recent advances in ultra-low-power machine learning acceleration can now answer that question.

Types of applications that could benefit

To understand this breakthrough in context, let’s look closer at some use cases. In many safety and security applications, for example, it is important to know if there is a person present in the area you are monitoring (figure 1). This could be to detect intruders, to ensure no one has wandered too close to dangerous equipment, or simply to turn on some lights to avoid someone tripping in a dark room. You don’t really care who the person is, but neither are you interested in false positives from squirrels and tubular bells like those that set off alerts in my friend’s back yard.

Synaptics embedded vision - conference_room
Figure 1: A low power vision solution enables occupancy management in conference rooms. (Image: Synaptics)

This turns out to be a good application for machine learning—in fact, for a quite simple machine-learning model. In this context, a model is the set of data and instructions, built up through running lots and lots of data through a process called training, that a machine learning system uses to generate inferences—such as the inference that yes, there is a person in the image, or no, the figure in the image is the boss’s golden retriever.

The pandemic has created another category of applications that, unfortunately, threatens to be with us for a while: social-distance monitoring. It can be vital to control access to enclosed spaces to ensure people don’t exceed capacity limits (figure 2). The best way to do this, short of a human minder at the door, is a system that can count people as they enter and leave. Yes, this is just another use of person detection. An added feature of such a system would be to detect if the person in question is wearing a mask. That, too, is a relatively simple task for a trained machine-learning model.

Synaptics embedded vision - stadium_overhead
Figure 2: People counting without imposing personal recognition or identification features can be used for queue management at stadiums and event venues while protecting privacy. (Image: Synaptics)

There is a quite different application area that turns out to be closely related. Organizations have intensified their focus on just how much they spend on office space and are deciding how best to optimize space based on how it is used. This is especially true as office managers consider use of dramatically smaller (and cheaper), often shared workspaces. But you can’t optimize what you can’t measure. Suddenly, there is a host of new questions. Does anyone use this hallway? When is the coffee room busy? How many hot desks are available? How often are all three conference rooms occupied? This sort of data is essential to minimizing office expense without minimizing productivity. And again, you don’t need to identify the people or understand what they are doing. You just want to be able to detect their presence.

Let’s look at a real-world situation typical of what is being considered in most companies these days: a company has an office in a high-rise in an expensive downtown area such as Manhattan or Downtown San Francisco. They have forty cubicles and five conference rooms. The cubicles are occupied for a least part of the working week. Only three of the conference rooms get used a lot more than the other two, based on the people-detection/counting inputs from the overhead cameras. Now when the company expands and need to hire twenty more people, instead of leasing another floor in the same building for a high rent, they can analyze actual use of the cubicles and the meeting rooms. The solution might possibly be to convert one of the unused conference rooms into the additional twenty cubicles or come up with a flexible hybrid model that gives people workspace when they need it and maximize the existing cubicle use. This would lead to tremendous cost opex savings and could be adjusted as capacity and workforce habits change.

Detecting specific attributes

That brings up one other category of applications: compliance checking. Machine-learning systems can be trained to detect visible attributes of persons. Does the person have an ID badge visible? How about a hard hat, or a respirator? Is the person about to bring a lit cigarette into a room subject to explosive gas leaks?

Experience has shown that machine-learning models can perform these sorts of detection tasks better than older styles of vision-processing software algorithms can. Machine learning models can also be more accurate and reliable than human monitors, especially if long periods of sustained attention are necessary. And when the task is detection—not identifying individuals, interpreting gestures, or other such tasks that require subtle inferences based on large numbers of fine details—the models can be quite compact.

If the model is compact, and if the video data comes in at a modest rate instead of pouring in at 60Hz, progressive-scan UHD, for example, then the processing power needed can also be modest. It will need to be more than a typical microcontroller chip could provide, but far less than what you could get from an inference accelerator designed for high-performance computing, or from a power-guzzling GPU.

This would be an ideal place to apply the technology that has been developed in recent years for ultra-low-power computing: memories, controllers, and signal processors. These technologies could make possible a machine-learning inference accelerator fast enough for visual detection tasks, but low enough in power consumption for unattended battery-powered operation.

And with that just-right amount of speed would come an added benefit. The limited speed and memory capacity of such a device would make it virtually impossible to use the chip for unauthorized tasks, such as facial recognition. This fact could greatly ease regulatory burdens for deploying systems in areas sensitive to privacy regulation.

An example that can deliver this

In fact, such an ultra-low-power machine learning accelerator already exists: the Katana KA 10000 SoC from Synaptics. The chip integrates a set of processors, including an Arm CPU, several DSP cores, and a custom neural-network accelerator, to provide a complete inferencing acceleration platform for a range of different kinds of modest-sized machine-learning models.

So far, this description could apply just as well to any of a number of AI acceleration chips for high-performance computing. But when you are aiming for months of battery life instead of dozens of Giga-operations per second, you have to do things differently, from the outset.

This means starting with a semiconductor process technology optimized for low power rather than for highest speed. It means designing circuits that consume only enough power for the task at hand, and that shut down when they aren’t needed. And it means choosing processor architectures, such as the Arm Cortex-M33 CPU, DSP cores, and a proprietary neural processing unit, that can cooperate to complete a given inference with the least possible battery drain, rather than the least possible time delay. It also means providing on-chip, low-power memories and peripheral interfaces for cameras and microphones.

For an SoC that will be used in the field, handling sensitive personal data, security is also a primary concern. Secure storage of keys, secure boot and code updating, and hardware-assisted encryption are all issues that must be resolved at hardware level.

What results can be expected in practice?

So how successful is the focus on ultra-low power? Synaptics claims that the KA10000 can process incoming video and produce ten inferences per second while operating for nearly three years on one battery.

The combination of image-detection performance and ultra-low power opens a world of new applications for inexpensive, unattended and untethered smart cameras. But historically, video-inference systems have been discouragingly complex to program—so much so that a new sub-profession of AI experts has emerged in the industry. Yet few of these applications would ever be served if the first step in using a neural accelerator SoC were hiring a team of AI experts and data scientists. A capable SoC in this field demands a capable development environment.

Accordingly, Synaptics has a collaboration with Eta Compute to provide the TENSAI Flow development platform for the KA 10000. The platform includes a compiler that optimally implements models on the KA 10000 computing system; pre-designed and trained demonstration machine-learning models for tasks such as person detection, and industrial safety; and the middleware and device drivers to complete the system.

Users who wish to develop their own models may use TensorFlow within the TENSAI platform. But model development demands a set of complex tasks – data collection, data filtering to generate a most-relevant dataset, using that dataset to train a neural network model, optimizing the model to fit within the memory constraints of an ultra-low-power SoC, and then programming the model into the executable firmware binary.

All of these tasks can be daunting for a non-specialist software developer. The process can take six to nine months—or more, if anything goes wrong—and hence deployment of edge AI devices threatens to be a time-consuming effort, putting schedules, budgets, and even market windows at risk.

Success requires a collaborative ecosystem of hardware, software and IP providers. For example, Synaptics helps expedite this process through partnerships with MLOps companies such as Edge Impulse. Using the Edge Impulse environment with Synaptics’ Katana platform, a customer can prototype a model in a few days and build a production model in a few months. That means low risk and rapid deployment of differentiating, ultra-low power edge AI devices.

The chip industry is responding to the need for effective and use-case specific machine learning in inexpensive autonomous cameras for the first time. Solutions are now available that deliver a combination of comprehensive but suitable-for-mere-mortals development environments and a complete neural network-enhanced SoC at a compelling cost power and performance point.  This is opening the frontiers of low power person detection and other visual detection capabilities that will improve our lives in a variety of ways.


Ananda Roy - Synaptics

Ananda Roy is a senior product manager at Synaptics leading the low-power AI product Line. Prior to Synaptics, he worked at Broadcom for eleven years in applications engineering for the Wi-Fi connectivity business unit, to lead design-wins with top internet service providers such as Comcast, Verizon, AT&T, retail router vendors such as Netgear, Linksys and Belkin, and their OEMs such as Arris, Foxconn, Arcadyan. Prior to Broadcom, he worked at Cypress Semiconductor as applications engineer on their proprietary 2.4GHz wireless technology, winning major HID peripheral designs at HP, Logitech and Apple. Ananda has an MSEE from University of Southern California and an MBA from University of California, Davis.


Related Content:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.