As a result of evolutionary breakthroughs in both hardware and software, voice user interfaces, or VUI-enabled products of every size and at every price point can now be brought to market. Rather than the “speak into the microphone and wait” model that has defined the voice control experience until today, new systems can now be embedded into products in ways that are discreet or even invisible to the end user, with immediate response times that don’t experience cloud latency.
In evaluating voice-controlled alternatives to smart speakers and the cloud, product developers are presented with new opportunities and challenges. Developers must consider size and placement as highly miniaturized devices are embedded into home furnishings and appliances. Merging voice applications with machine learning at the edge is necessary to allow products to get smarter over time. At the same time, devices are also increasingly relying on battery power requiring developers to engineer for optimal energy management. Finally, developers must consider the expectation of functionality to the user. To a consumer, well-designed voice control feels omnipresent, with the capability to hear around corners and through walls. All of this and more brings together several design elements that must be considered when building VUI-enabled products.
One of the more liberating aspects of voice control 2.0 will be the freedom to speak voice commands without needing a smart speaker nearby. Voice integrated throughout smart home devices can make the entire home into a listenable zone, available for duty any time at the recognition of a wake word or other definable sound. Specialized hardware and software are utilized to produce accurate far-field audio capture.
To effectively capture sound in a far-field context, several design techniques come into play including: Port orientation: The acoustic port is where audio signals can be accepted without physical obstruction. The location of the port, on the top or the bottom, is determined by the form factor of the individual device. To simplify design the acoustic port is typically located near the microphone, as seen below. However, the port hole should be far enough from speakers and other acoustic noise sources (such as motors and amplifiers) to minimize unwanted signals at the microphone input.
Microphone arrays and beamforming: The use of multiple microphones is called an “array.” At any given moment, a microphone array hears sounds from all directions at once. In addition to spoken commands, they hear other voices and movements around the home. Through a technique called beamforming, the microphone array can be programmed to selectively capture sounds from one direction while rejecting sounds from other directions. The end result of beamforming algorithms is a selective tuning out of signals other than those from the desired direction. Beamforming is the first step in the digital signal processing chain.
Digital signal processing (DSP) algorithms: DSP is a cornerstone of any voice control system. It’s what makes sense of wanted audio information – capturing it, focusing it, purifying it, amplifying it – so that the phonemes can make words and commands and noises don’t disrupt that process. DSP is used at virtually all stages of the voice interaction, from audio capture and voice enhancement to speech processing.
Designing with convenience in mind
Beyond designing for function, VUI-enabled devices need to be designed for convenience and usability as well. Two primary design considerations for the wake word center around energy management and processing power. Since the VUI must be in constant “listen” mode to await a wake word, battery powered VUIs and VCDs must be designed for extremely low energy draw as well as immediate wake. Further, a device’s ability to distinguish wanted spoken commands while filtering out unwanted sound requires not-inconsiderable processing power that demands both accuracy and immediacy. Products that incorporate powerful audio edge processors can now provide the computing power and low-power, low-latency operation that enable immediate user experiences.
The explosive growth of voice control, even in today’s early stages, is ample proof of market opportunities for voice control in the smart home. The pillar tasks of the smart home, including security, energy management, entertainment and senior safety are all made simpler and more accessible with voice control. Voice can finally act as a unified controller for an entire smart home, with commands from any room in the house…if they’re designed with the user in mind and leverage the significant advancements in voice control technologies.
Vikram Shrivastava is senior director for AISonic edge processors at Knowles, where he develops strategies and products to enable IoT platforms with intelligent voice capability. He has almost 30 years of experience in product marketing, strategy, and management in the semiconductors and technology industry. Vikram’s educational background in electrical engineering, specifically in control systems and silicon design, provides him the ability to understand, execute, and communicate marketing strategy that fits the technical needs that engineers, developers, and OEMs have. Vikram holds an MBA from University of California, Berkeley’s Haas School of Business.
- How audio edge processors enable voice integration in IoT devices
- Design considerations for low-power, always-on voice command systems
- Renesas and Syntiant offer joint voice-controlled vision AI solution
- Add voice on a microcontroller without having to code