Advances in artificial intelligence now enable sufficient accuracy in voice biometrics that it can be used for not just identification and personalization, but for applications such as payment authentication. A new solution from Synaptics and ID R&D offers AI-powered voice biometrics and anti-spoofing algorithms that can run on a Synaptics SoC in the edge device. Specifically, the software has been optimized for the neural processing unit (NPU) in Synaptics’ VS600 series for smart home devices such as set-top boxes (STBs), smart speakers and security systems.
Synaptics sells its AI-capable SoCs into smart home products which need to process video streaming, audio streaming and imaging. A typical use case might be a STB incorporating cameras for video conferencing, for example.
Voice biometrics is now accurate enough to enable payment authentication in smart home devices
“What’s common across the set top box application and becoming more common as time goes on is the ability to use voice as an interface,” Vineet Ganju, vice president of marketing at Synaptics told EE Times. “Remote controls can be voice enabled so you can talk into it to navigate your Netflix account and search for movies… voice as an interface is almost becoming standard in these applications.”
When a STB runs Netflix, the first thing users have to do is select whose profile to use. With voice biometrics, the STB would know immediately who was watching, cutting a step from the process.
“For example, with pay-per-view content, you could not only search via voice and find certain movies that maybe are not part of your subscription, but you’re willing to pay $5 to watch,” Ganju said. “Then [operators] want to be able to immediately authenticate and have you purchase that movie on the spot. They see that as a huge reduction in friction, helping users not only in finding content that’s personalized to them, but also being able to pay for that content and be able to watch it.”
ID R&D’s voice biometrics AI extracts more than 400 features from the voice, including combinations of parameters related to frequency/pitch and other things such as pronunciation and accents.
“It’s not limited the way some of the earlier generation of voice biometrics were,” John Amein, senior vice president of sales at ID R&D told EE Times, adding that it is only in the last year or so that AI voice biometrics has achieved the accuracy required for applications like payment authentication.
The algorithm learns to recognize the user’s voice through a process called “enrollment” during which the user repeats a phrase three times. Any phrase can be used, and it works on any language out of the box. Enrollment is processed on the edge device.
ID R&D’s AI algorithm can identify enrolled users with a false acceptance rate below 1 in 10,000, which Amein compares to the odds of someone guessing your PIN. The false reject rate — the rate at which the enrolled user’s voice is erroneously rejected — is close to 5%. And the spoof acceptance rate (SAR), for spoofing attacks such as recordings of the user’s voice played to the system, is better than 7%, which is the standard limit for biometric unlocking of Android devices.
“Between the biometric matching being at a false accept rate of one in 10,000 and the anti spoofing being better than the 7% rate required by the Android standard, we’re really hitting both of the things that are necessary for voice biometrics to be accepted as secure enough for a payment authorization,” Amein said.
ID R&D’s anti-spoofing technology also relies on AI.
“Spoken voice has a bandwidth that goes up to 3500 Hz, and we’re sampling at a much higher rate than that,” Amein said. “So we’re hearing frequencies higher than the spoken voice. We listen in these higher ranges for different characteristics.”
Human voice created by speaking through our tubular vocal tract produces characteristic frequencies which are very different to sounds produced by the vibration of a flat surface such as in a loudspeaker. This is one of the elements the anti-spoofing AI uses to distinguish a live voice from a recording.
“We also can detect synthesized voices, such as text to speech applications,” Amein said. “A lot of those aren’t that great, but they’re getting more and more lifelike. And in that scenario, there are still anomalies in the signal – it’s too perfect in some cases, or there’ll be just transitions or phase differences that the ear can’t hear, but the [AI] can.”
Neural processing unit
Synaptics’ VS600-series SoCs feature a neural processing unit (NPU); the VS680’s NPU offers 6.75 TOPS while the newly announced VS640 offers 1 TOPS and is aimed at “more mainstream costing and performance and power points,” said Synaptics’ Vineet Ganju. Either part’s NPU has “more than enough” compute to run ID R&D’s voice biometrics algorithms and anti-spoofing simultaneously, he said. The NPU was able to speed up voice biometrics inference by a factor of 10 compared to using the chip’s CPU, whose utilization was reduced by a factor of 3.
Synaptics provides a toolset to allow companies like ID R&D to optimize their technologies for the NPU, and while ID R&D is Synaptics’ first partner in this regard, the company will work with more partners in future for applications outside of voice biometrics.
“Based on our discussions with speech recognition companies, we can actually do a full English language vocabulary speech recognition engine on the device, well within the 1 TOPS capability of the NPU,” Ganju said. “So you can have a fully offline product with regards to speech recognition… for example, for products where users don’t connect it to their WiFi right away, the onboard speech recognition can help them get a good out of the box experience even before it’s connected.”
The first software build from ID R&D will be available on Synaptics’ VS600 development kits later this month.
>> This article was originally published on our sister site, EE Times.
- How extensive signal processing chains make voice assistants ‘just work’
- AI finds its voice in audio chain
- Squeezing speech-to-text inference models onto small MCUs
- Optimizing keyword spotting with small neural networks and spectrographs
- How sensor technology enables context awareness in hearables
For more Embedded, subscribe to Embedded’s weekly email newsletter.