Optical technology creates more secure environment for voice biometrics
Editor's Note:Recently, I posted a column about a Canadian artificial intelligence (AI) startup company called Lyrebird (see Thinking of using voice authentication? Think again!). The folks at Lyrebird are working on a new generation of speech analysis and generation technologies that -- as a bi-product -- could pose a significant threat to speech-based biometric security. One possible counter-solution may be as described in this column; now, read on...
Voice biometrics has become more widely adopted for user authentication because it delivers a more frictionless and seamless experience than solutions based on passwords and even fingerprints. At the same time, there is a growing realization across all biometric modalities that the most effective approach employs behavioral as well as physical characteristics to prevent fraud. Another challenge with biometric solutions, including voice, is how to ensure that personal identity information is protected. This can be done by ensuring that any relevant data is securely stored inside the system's hardware, rather than in the cloud, and distributed across embedded devices in the authentication infrastructure that is less accessible to hackers.
Now, Human to Machine Communication (HMC) optical sensor technology that isolates a speaker for near-perfect speech recognition in voice-controlled applications is available to support this biometrics implementation strategy by ensuring all biometric template matching functions are performed inside the end-user device (see New technology provides breakthrough speech recognition in noisy environments). This approach both secures and simplifies system solutions ranging from smartphones and PCs to FIDO-compatible ATMs and mobile devices as well as the connected car. It also opens the door to integrating other optically-acquired behavioral biometric authentication factors, such as the speaker's tone of voice and unique heartbeat and facial skin characteristics.
Improving biometric verification matching
Using biometrics to verify identities for user authentication is far more convenient and secure than using PINs or passwords. The person authenticating at the door or to a kiosk, ATM, or mobile service or application is matched to a biometric template in a stored database that was created at enrolment. This "1-to-1" matching process answers the question: "Is the person who is logging in or requesting access or entry the same person who created the biometric template?"
The matching process can either be performed in the cloud, on the host device, or inside the sensor. With the former approach, the biometric template (either fingerprint or voiceprint) data is captured when the user enrolls in the system, and then again each time the user presents his or her finger or voice during access or entry; all processing and matching functions are performed on the authentication system's host platform. In contrast, when matching and biometric management functions are performed inside the sensor, the overall system architecture is not only more secure, but also more streamlined, since all processing, template, and instruction storage, and higher-performance cryptography, also remain inside the sensor IC.
This ability to perform these functions inside the sensor has generally not been possible with voice modality, especially in noisy environments. This all changes when traditional acoustic voice-acquisition technology is replaced by a new type of HMC optical technology.
A new way to acquire voiceprints
Most would agree that voice is the most convenient way to interact with machines to control today's digital world, but none of today's solutions work in noisy environments. Even with the best noise reduction algorithms, acoustic microphones cannot isolate and identify one voice from among others, and machines are incapable of inferring meaning as humans do if background noise periodically drowns out the speaker. While speech-recognition software can be trained to understand accents and other speech patterns, they cannot be trained to ignore background noise.
The advent of HMC optical sensor technology solves this problem, turning the human voice into the most natural, personalized, and secure interface with the digital world. One example is VocalZoom's HMC sensor, which is pointed at the user's face and used to pick up signals created by vibrations on the facial skin during speech. This approach has the potential to revolutionize voice control and offers equally promising benefits for voice biometrics. The two applications are essential and complementary elements of any comprehensive voice user interface solution.
In voice-control applications, the latest HMC sensor technology is used alongside conventional acoustic microphones to deliver a near-perfect, isolated reference audio signal that improves the performance of today's noise-reduction software. In the case of the VocalZoom implementation, the sensor uses unique laser measurement techniques as it captures nanometer-resolution skin vibrations. The data is converted into audio; algorithms filter out any vibrations not associated with the user's speech; and these signals are combined with the output from the acoustic microphone. Because the facial vibrations are associated only with the sound waves originating inside the speaker's mouth and throat cavity, there is an extremely high level of directional pickup and -- in turn -- near-perfect isolation from extraneous noise and other background voices, thereby enabling seamless voice-control performance in applications ranging from wearables to the connected car.
Another advantage of this approach for voice biometric applications is that only the HMC optical sensor is needed to acquire a voiceprint. It is important to understand that there are two types of voice biometric characteristics: physical (based on the structure of the vocal cords and mouth) and behavioral (including tone changes and other behavioral characteristics). No two people display the same behavioral voice characteristics in exactly the same way during speech; thus, no two people generate the same voiceprint. An HMC sensor may capture less physical voice information (1 kHz out of the full 4 kHz spectrum) than an acoustic microphone, but it has a very high level of accuracy in behavioral detection. Using both types of information to perform user authentication with voice biometrics is very effective, particularly when it can be achieved using a single, noise-immune HMC sensor.
All of this is possible as long as the HMC optical sensor can perform adequately in noisy environments. It is critical that the sensor only pick up the signals that are inside the mouth cavity and transmitted from vibrations in the facial skin during speech. Tests performed by VocalZoom and its customers show that these signals are impervious to noise.
For higher-security applications, such as financial wire transfers, two-factor authentication is required. Until now, solutions in these applications have generally combined fingerprints with facial recognition or a password. With an HMC optical sensor, it will be possible to combine two authentication factors in a single solution. As an example, a voiceprint could be combined with a skin-print that is unique to each person and allows for two-factor authentication in a single sensor. Skin biometrics is a frictionless authentication factor that requires no special action by the user during identification. It is also continuous, in that a device can authenticate again and again during an authentication interaction.
In all applications, HMC optical sensors provide a simple and secure way to perform biometric template matching inside the sensor, but there is also an additional value. In the past, one of the few advantages of conducting biometric template matching on the host was that the host could implement all required "anti-spoof" mechanisms for detecting biometric forgeries. With HMC optical sensors, it is now possible for the voiceprint acquisition technology, itself, to deliver these anti-spoof capabilities through inherent liveness detection, or the ability to know whether the person seeking entry or access is present and not a recording. With this capability, an HMC optical sensor can also detect other parameters such as skin characteristics, heartbeats and blood flow at the same time as it is acquiring the user's voiceprint. Each time a user authenticates, a voiceprint and/or other behavioral biometric data is acquired in real time, optically confirmed to be from a living person rather than a recording, stored inside the sensor, and securely matched against information in its embedded template to verify the user and complete the authentication process.
Using this new generation of HMC optical sensors to perform biometric template matching inside the sensor aligns well with initiatives like the FIDO Alliance Passwordless User Experience (UX) Universal Authentication Framework (UAF) protocol. In this experience, the user registers their device to the online service by selecting a local authentication mechanism, such as speaking and entering a PIN, etc. Once registered, the user simply repeats the local authentication action whenever they need to authenticate to the service. The user no longer needs to enter their password when authenticating from that device. UAF also allows experiences that combine multiple authentication mechanisms such as fingerprint + PIN. When an optical sensor is used to acquire voiceprints, these additional authentication factors could also include heartbeat characteristics, facial feature geometry, or various forms of behavioral authentication metrics that the sensor can capture.
The latest HMC Optical sensor technology offers a new approach to voiceprint acquisition that enables a more streamlined, compact system solution supported by inherent liveness detection. It converts the speaker's facial vibrations into a unique, noise-free voiceprint both at enrolment and for biometric template matching, with all acquisition, storage, matching, and cryptography functions executed in a completely closed match-in-sensor environment. Optically-acquired voiceprints have the potential to be the most secure, cost-effective, and easiest biometric to verify for fraud-proof authentication, in full compliance with financial and other regulatory mandates.