Renesas Electronics Corporation and Syntiant Corp., a provider of low-power intelligent voice and sensor processing in edge devices, have jointly developed a voice-controlled multimodal artificial intelligence (AI) solution to enable low-power contactless operation for image processing in embedded vision AI-based internet of things (IoT) and edge systems.
The new solution combines the Renesas RZ/V series vision AI microprocessor unit (MPU) and the low-power multimodal, Syntiant NDP120 neural decision processor to deliver advanced voice and image processing capabilities in applications such as self-checkout machines, security cameras, video conference systems, and smart appliances such as robotic cleaning devices.
The joint solution features always-on functionality with quick voice-triggered activation from standby mode to perform object recognition, facial recognition, and other vision-based tasks that are critical functions in security cameras and other systems. For example, while user-defined voice cues drive activation and system operation, vision AI recognition tracks operator behavior and controls operation or issues a warning when suspicious actions are detected.
The multimodal architecture makes it easier to create contactless user experiences for vision AI-based systems. Using a dedicated, power-efficient chip for voice recognition reduces standby power consumption while speeding up system development because it is possible to develop software independently of the vision AI functionality.
The Renesas RZ/V series MPU incorporates Renesas’ DRP-AI (dynamically reconfigurable processor-AI) accelerator and combines high-precision AI inference with a power efficiency that Renesas claims is among the best in the industry. This superior power performance eliminates the need for heat dispersion measures such as heat sinks or cooling fans, which reduces the bill of materials (BOM) cost and makes it possible to integrate vision AI into a wide range of embedded applications. DRP-AI is composed of AI-MAC and DRP (dynamically reconfigurable processor), which can efficiently process operations in convolutional and all-combining layers by optimizing data flow with internal switches. The DRP can process complex processing such as image pre-processing and AI model pooling layers flexibly and quickly by dynamically changing the hardware configuration. The provided DRP-AI translator automatically allocates each process of the AI model to the AI-MAC and DRP, thus allowing the user to easily use DRP-AI without being aware of the hardware. Multiple executables output by the DRP-AI translator can be placed in external memory. This makes it possible to dynamically switch between multiple AI models as a system. In addition, the DRP-AI translator can be continuously updated to support newly developed AI models without hardware changes.
The Syntiant NDP120 chip incorporates sophisticated AI capabilities that can be used to implement many high-precision, hands-free voice functions, including speaker recognition, keyword detection, multiple wake words, and local command recognition. Packaged with the Syntiant Core 2 neural network inference engine, the NDP120 can also run multiple applications simultaneously while minimizing power consumption to 1mW battery power.
Speaking for Renesas, Hiroto Nitta, said, “We anticipate that demand for multimodal systems that use multiple streams of input information – both image and voice – will increase moving forward as a way to improve both ease of use and safety.” Nitta, a senior vice president and head of SoC business in the IoT and infrastructure business unit, added, “Through the collaboration between Renesas, a leader in low-power image AI technology, and Syntiant, a leader in voice AI technology, we will accelerate the adoption of low-power, ultra-small smart voice AI technology in embedded systems and deliver new combined solutions to customers globally.”
Syntiant CEO Kurt Busch said, “Voice-based user interfaces will make it possible for customers to deliver new user experiences that bring the next generation of innovative ideas from concept to reality. We’ve already shipped more than 15 million of our deep learning NDPs globally to enable always-on voice in a wide variety of consumer and industrial IoT applications. Our collaboration with Renesas delivers a powerful, low-power voice and image solution that is certain to accelerate traction among a global customer base in a variety of devices and use cases.”
The new voice-controlled multimodal AI solution uses multiple mutually compatible devices from broader Renesas portfolio to provide customers an elevated prototyping platform for faster time to market and reduced risk. The new solution is part of Renesas’ “winning combinations”, which feature compelling analog, power, and embedded processing product combinations that help customers accelerate their designs and get to market faster.
The reference design for the new multimodal AI solution is available now, including circuit diagrams and BOM lists.
- Tools Move up the Value Chain to Take the Mystery Out of Vision AI
- Algorithms and hardware power rise of voice control
- How audio edge processors enable voice integration in IoT devices
- Design considerations for low-power, always-on voice command systems
- Add voice on a microcontroller without having to code
- Benchmarks show AI performance on tiny systems
- Hybrid architecture speeds AI, vision workloads
- Xilinx SOM targets broader adoption of edge AI and embedded vision
- GWM adopts Ambarella AI vision SoC for new SUV
- Low power AI vision board lasts ‘years’ on single battery
- Smart camera offers turnkey edge machine vision edge AI