We are rapidly approaching an inflection point with regard to embedded speech and embedded vision. In the not-so-distant future, each of us will interact with hundreds of electronic devices throughout the course of the day at home, in our cars, and in the office — just about everywhere we go, actually.
Do you remember the frustration of trying to set the time on a VCR circa the mid-1980s? I don’t think my parents ever successfully managed to do so under their own steam. And once you had set the time, you were faced with the daunting task of actually recording a program. I cannot recall how many times I recorded the right channel at the wrong time or the wrong channel at the right time. Sometimes I excelled myself by recording the wrong channel at the wrong time. It was a rare fiesta day indeed when I managed to capture a program I actually wanted to view.
The point of all this is that our ability (or lack thereof) to configure, control, and interface with the myriad devices that will soon surround us will directly affect our quality of life. In the case of embedded speech, natural language interfaces (NLIs) are expected to revolutionize the way in which we interact with the technology that surrounds us.
Of course, speech recognition in potentially noisy environments is a tricky beast that requires a lot of compute power. All of which leads us to the xCORE devices from XMOS. These little beauties feature multiple deterministic processor cores that can execute multiple tasks simultaneously and independently. Furthermore, external interfaces and peripherals are implemented in software, thereby allowing embedded systems developers to implement the exact combination of required interfaces.
The reason I'm waffling on about all of this here is that I just ran across a 26-page whitepaper on the XMOS website. This little rascal examines the DSP capabilities of the recently introduced xCORE-200 architecture.
Based on the example of a voice interface front-end using Pulse Density Modulation (PDM) microphones, the paper analyzes how low-latency DSP processing can be implemented on the xCORE-200 architecture, while still leaving significant resources for system control and custom functions.
I don’t know about you, but I am more than ready for the advent of true speech recognition and natural language interfaces. It's not like I want to rule the world — I just would like to be able to do simple things like climb into bed and say “wake me up at six-thirty” and have the alarm clock make the assumption that I'm talking to it, even if my wife has the television set running in the background while she's telling the cats how beautiful they are.
As a small postscript to the above, I currently use my iPad as an alarm clock because the convoluted controls on the cheap-and-cheerful bedside clock I purchased a couple of years ago defy human understanding. It's currently running 10 minutes fast, but I no longer trust my ability to change the time, and as for using it as an alarm clock… suffice it to say that after multiple rude awakenings in the wee hours of the morning, I've been banned from further experimentation. Thus, the clock's only remaining role is to give me something to look at if I wake up in the middle of the night. Am I the only one who has these problems, or do you have your own appliance woes?