I recently attended MWC Shanghai. Robots were big – very big. I saw dozens of companies, looking for customers to brand and offer them in any one of many applications. Take just one example application – Tug, the robot nurse. This doesn’t look much like our sci-fi image of robots, unless you’re thinking of some of the more utilitarian examples in Star Wars. It’s a box on wheels, but it has many of the features we expect in mobile robots, including navigation and obstacle avoidance. It can navigate around a hospital, it will stop if someone steps in front of it and it will steer around an errant IV stand; it can also call an elevator to get to another floor.
The point of Tug is to deliver medications and food to patients and it is already deployed in 37 VA hospitals in the US. Imagine the workload that takes off the shoulders of busy candy-stripers. There are multiple other assistant applications, in elder care, in teaching support, restaurants and hotels. Think of this as the next big thing in personal assistants following smart speakers (Amazon already has 100k+ robots working in their warehouses, so obviously they’re working on home robots as a sequel to the Echo). This isn’t science fiction; home assistant robots are shipping today.
Robot health assistant (Source: CEVA/Shutterstock)
There are obvious technical challenges in producing this kind of robot, not dissimilar to autonomous driving problems, though there are some clear differences. Navigation and obstacle avoidance are common, but concepts of clear driving lanes and traffic management don’t apply for these robots; it’s all about obstacle avoidance and navigation within a building (with remapping to get around temporary immovable obstacles). And while a natural language interface may be a nice-to-have in a car, for robot assistants it may be essential. Who wants to learn to push buttons when the pharmacy sent the wrong medication, or the restaurant messed up your order?
Gartner recently came up with a top-10 list of AI and sensing requirements for robots, among which they include:
Computer vision – scene analysis, object recognition, etc.
Biometric recognition and authentication – who is talking to me and are they allowed to give those commands
A conversational interface – speech recognition and natural language processing
acoustic scenery analysis – recognizing distinctive noises such as a dog barking or glass breaking
Location sensing – where am I and what/who is near me
Autonomous movement – ability to move to a target elsewhere in the building without colliding with objects or people
AI functions in the robot – not only depending on the cloud
The default approach to building systems today with these capabilities starts with building an AI system into the robot based on a multicore GPU platform. This is understandable – product builders can prototype a solution using an off-the-shelf platform without needing to worry about ASIC details, in much the same way they would use a CPU development board for more traditional applications. But as product volume ramps or you are pushing it to ramp, cost and customer satisfaction / differentiation become increasingly important. Off-the-shelf solutions are expensive, they’re power-hungry and it’s difficult to differentiate when you’re using the same platform as everyone else. Which is why inevitably high-volume solutions turn to ASIC platforms. You don’t need to abandon all the investment you put into your prototype; a lower-cost GPU platform might remain part of the solution, but a significant level of AI functionality can be offloaded to a much more cost-effective and more highly-integrated platform.
Performance per watt advantages of DSPs over GPUs in machine-learning (ML) applications are well known, derived in part because of fixed point over floating point operations and for flexibility in quantization in some platforms. And the price advantages (in volume) of custom solutions are well known. This is why you’re more likely to see an embedded DSP in volume/price-sensitive ML applications at the edge than an off-the-shelf GPU.
But can you do everything you could do in the GPU? It turns out that you can do quite a lot. Take computer vision – positioning, tracking, object recognition and gesture recognition for example. This level of vision processing is already available today in some embedded DSP-based platforms. Or take autonomous movement supporting local retraining (without having to go to the cloud). Again the core recognition capabilities to support this intelligence, the same capabilities you would also find on a GPU, are available on a DSP.
Voice recognition/authentication and acoustic scene analysis can also be offloaded. These (along with the other examples here) neatly highlight why offloading makes so much sense. Each of these intelligent operations breaks down into multiple steps, say from voice pickup and direction resolution to perhaps basic word recognition and ultimately even natural language processing (NLP). The last step is challenging and may require going to the cloud. But the steps before that can be handled very comfortably in an embedded solution. Some applications, where only a limited vocabulary need to be recognized or where you want to detect non-verbal cues such as a window breaking, you may not need the cloud (or a local GPU) at all. There are already hints that even limited NLP may be supported at the edge in the near future.
An extensive array of solutions have emerged to support these front-end functions using AI at the edge, in front-end voice processing and in deep learning in the IoT. Using these solutions, developers can more easily address emerging challenges for making robot personal assistants ubiquitous.
Moshe Sheier is Director of Strategic Marketing, CEVA, where he oversees corporate development and strategic partnerships for CEVA’s core target markets and future growth areas. Moshe is engaged with leading SW and IP companies to bring innovative DSP-based solutions to the market. In his spare time, Moshe rides mountain bikes and practices Aikido.