Thinking of using voice authentication? Think again!

May 24, 2017

Max The Magnificent-May 24, 2017

Most of us find the security features we are currently obliged to use to access our computers, smartphones, and other devices -- also to logon to our various (numerous) web-based accounts -- to be incredibly frustrating.

I can’t tell you how many accounts and passwords I have; it brings tears to my eyes just thinking about them. Also, I really dislike messing around with those fingerprint detector things on notepads and smartphones. It would be great if they always worked first time, but it's vexing when you are obliged to try them two or three times before they grudgingly condescend to work.

I long for the day when whatever system and/or website I'm using simply looks at me and/or listens to me talking and says to itself "That's Max the Magnificent -- you can tell by his regal good looks and astounding sense of fashion -- I'll be honored to grant him full access to all of his applications and data," but there's a small fly in the soup, as it were...

Do you recall a couple of years ago when I wrote a column about Sensory's TrulySecure technology, which -- as I said at that time -- "combines speech identification and facial recognition to give a state-of-the-art security solution"? (See Voice & Face Unlock Smartphones & Tablets).

TrulySecure is an on-device biometric identification system that does not rely on a connection to the cloud. Of course, there have been numerous science fiction stories about this sort of thing, including tricking the system by using a photograph of a person instead of the person themselves (TrulySecure gets around this by looking for the tiny muscle movements that signify a living, breathing person).

As an aside, I just remembered the Tcity trilogy by Mark Adlard: Interface (1971), Volteface (1972), and Multiface (1975). One scene, in particular, is pertinent to our discussions here -- the one where the hoi polloi are revolting (aren’t they always?) and they chop the lower arm off a member of the ruling class and use the severed limb as identification to gain access to the reserved parts of the city, but we digress...

So, what about voice authentication? Supposing one of your friends were to knock on your door in the middle of the night, and you shout, "Who's there?" and they respond, "It's me!" Even from just these two words, you could probably work out the identity of your visitor, until now...

The reason for my qualification is that I just heard about a Canadian artificial intelligence (AI) startup company called Lyrebird. The folks at Lyrebird are working on a new generation of speech analysis and generation technologies that they plan on offering to developers of embedded (and other) systems.

At first, this all appears to be fairly innocent. You can choose from thousands of predefined voices or you can design a unique voice for your particular application. You can even control the emotional aspects of the generated voice to reflect happiness, anger, sympathy, stress, etc.

Where things become a tad more perturbing is when we learn that Lyrebird's deep-learning/neural-network-based systems can analyze as little as a minute of someone speaking and use this to generate a unique key. This key can subsequently be used to generate any speech, mimicking its corresponding voice, augmented with any desired emotion.

The folks at Lyrebird are planning on offering an API that will listen to someone talking -- or a recording of someone talking -- and generate the associated key. Another API will allow the user to generate any speech or conversation using the desired voice(s). As an example, check out Lyrebird's demos page. One example features a pseudo-conversation between Donald Trump, Barack Obama, and Hillary Clinton.


(Source: lyrebird.ai)

We are still in the early days of this technology and there's still a lot of work to be done, but this certainly gives one a taste of things to come. There are so many implications here that reach far beyond the threat to speech-based biometric security.

Do you remember the first Terminator movie from 1984 when a cyborg assassin from the future (played by Arnold Schwarzenegger) comes back through time to kill Sarah Conner (played by Linda Hamilton), whose yet-to-be-conceived son will one day become a savior against the machines in a post-apocalyptic future?

The scene I'm thinking of at the moment occurs after the Terminator has killed Sarah's mother. When Sarah, unaware of the Terminator's ability to mimic victims, attempts to contact her mother via telephone, the Terminator perfectly impersonates the mother. Seeing the Terminator's lips move and hearing a woman's voice appear to come out of them -- perfectly synchronized -- sent a shiver down my spine even then; now I'm shivering all over!

Once technology -- like that from Lyrebird -- becomes widely available, how will we know for sure to whom we are actually talking when someone calls us up on the phone. What about the voice recordings acquired by the tapping of phones by law enforcement agencies? These are often used to convict criminals and terrorists. In some cases (absent of video), the defense attorneys could offer reasonable doubt that the voices on tape were indeed those of their clients.

Or what about politicians. Suppose someone released a tape that sounded like some politician a lot of people love to hate appearing to say something self-incriminating. I bet a lot of folks would be happy to believe the worst.

The more I noodle on this, the more fearful I become. What about you? What are your thoughts on the above? Can you think of any possible applications -- for the good or for the dark side -- that I haven’t mentioned here?

Loading comments...