Always-listening devices have made it infinitely easier to play music, turn on the smart TV, turn down the thermostat, and even alert us when someone is breaking into the house. But they have us plugging them into AC power or replacing batteries on an all-too-frequent basis.
While it sometimes feels like voice assistants have been in our lives for decades, it was only in late 2014 that Amazon launched the first smart speaker, Amazon Echo. Five years later, we now have hundreds of millions of digital voice assistants installed in smart speakers, smart home systems, wearables and other smart devices that are always listening for a wake word. From its latest research, SAR Insight & Consulting predicts that by 2023, the installed base of always-on voice-enabled devices will jump to almost 1 billion.
The sensors that first made always-listening and voice-first possible — ultra-miniature microelectromechanical systems (MEMS) microphones the size of a pencil point — capture environmental sound data. At first, it seemed like a fine solution to process that data in the cloud, analyzing the sound for wake words and commands. But exponential growth in voice assistants and other always-on IoT devices is producing so much data — 41.6 billion IoT devices generating 79.4 zettabytes of data in 2025, according to International Data Corp.1 — that we’re overtaxing the collective bandwidth and creating cost and power inefficiencies as an unintended consequence. This is driving the semiconductor industry to find new ways to bring some of that powerful cloud computing into the device — a capability called edge processing.
Challenges at the edge
The success of edge computing relies heavily on the rapid proliferation of low-power digital signal processors and microcontrollers— some of which include an embedded neural network, i.e., a tiny machine learning (TinyML) chip. These mostly digital processing chips can handle the complex analysis of data, such as deciding whether a wake word has been spoken, right on the device. But while these chips may now be as smart as a brain, they still rely on the original system architecture that was used in the first always-on sensing device, one that requires the immediate conversion of all sound — which is naturally analog — to a digital signal. That’s true even when the sound, such as a dog barking or a baby crying, couldn’t possibly contain a wake word. Wasting power and data, this same-old always-on-listening approach puts OEMs on a collision course with consumer dissatisfaction.
Consumers still expect the same or better performance from ever-smaller always-listening smart devices that can fit in a pocket or even inside an ear, but without trading off battery life. That places OEMs in a tough spot because if they stay with the legacy architecture, they’ll keep wasting 80% to 90% of battery life on processing meaningless data. They’ll be forced to make consumers choose the lesser of two evils: a non-portable voice assistant that has to be plugged into the wall or a portable voice assistant that can go anywhere but is hampered by short battery life.
Because moving data through a system costs power, the most efficient way to save power is to reduce the amount of data down to what’s important as soon as possible. If we truly want to solve the always-on-listening power challenge, we need a new paradigm that more closely mimics the brain’s ability to efficiently process the vast amounts of data coming from the human sensory system at any given moment. Spend just a little bit of power up front to determine what’s relevant, and save the majority of the resources to process only the most important data.
Sound is naturally analog
Improving battery life in always-listening devices requires embracing a technology that many of today’s engineers find both old-fashioned and intimidating: analog. Working with raw, unstructured analog signals from the real word – namely, touch, vision, hearing and vibration — is tough. Since the introduction of the first digital integrated circuit, it has been much simpler to create products that process sensor signals, with familiar ones or zeroes, than to directly process the analog data that is sensed. (That’s why always-on devices transform analog input into digital signals immediately, before doing almost anything else.)
While digital has effectively solved processing challenges for the last 50 years, it might have finally hit a wall in the laws of physics. The slowdown in digital device scaling has caused technologists to get creative with the chips inside the device. In this case, that creativity has come through two fundamental changes: Use digital more strategically, so digital chips do heavy processing only when necessary; and use the inherent low power of analog circuitry, combined with machine learning, to do a first round of analysis that determines whether voice is present while the sound data is still in its natural, analog state. That keeps the digital processing chips in low-power sleep mode until they are actually needed to “listen” for a keyword.
The path to greater power efficiency in always-on devices lies not in having each chip “think like a brain” but in reimagining a system architecture that is more like the human sensory system, progressively analyzing sound in layers so that the most energy is focused on what’s most important.
Bio-inspired edge processing (bottom) focuses the digital processing power on the most pertinent sensory data. (Image: Aspinity)
The hunt for longer battery lifetime will encourage system designers to embrace a new architectural paradigm in which less data processing means more battery life. Residing on the edge, an analog ML chip can act like a smart traffic manager that lets digital processing chips stay asleep unless they are needed. This bio-inspired always-on edge processing approach allows the analog and digital processors to perform the jobs at which they’re most efficient, making the consumer the ultimate winner. After all, who wouldn’t want a voice-activated TV remote that runs for a year on a single set of batteries?
>> This article was originally published on our sister site, EE Times Europe.
|Tom Doyle is founder and CEO of Aspinity.|
- How extensive signal processing chains make voice assistants ‘just work’
- Adaptive listening sensor ‘learns’ background noise, battery life rises
- Better audio processing at the edge
- How analog in-memory computing can solve power challenges of edge AI inference
- AI chip’s analog-processing approach slashes power
- Optimizing wearable display power consumption
- Using DSPs for audio AI at the edge
For more Embedded, subscribe to Embedded’s weekly email newsletter.