In case you haven't heard, 2017 has already been dubbed “the year of the voice interface.” The year started out with voice activation front-and-center at CES in Las Vegas, and this remained so mid-year at MWC in Shanghai, which I recently attended. This appellation is further underpinned by the number of Internet and technology giants that are constantly joining the race to construct the perfect speaker-based personal assistant.
Smart speakers in the U.S.
Since the introduction of the Amazon Echo in 2014, this market has experienced exponential growth. Per audioxpress.com, in the US, smart speakers account for 30% of shipments in the home audio hardware category. Growth is expected to continue in the next few years, with projections of about 100M units by 2020, with 75% of US households owning a smart speaker.
Last year, Google released its Google Home speaker. Since then, it has managed to pry away a share of the market, but is still far behind the Echo. According to the eMarketer US forecast for 2017, Amazon's Echo speaker will control 70.6% and Google Home will reach a 23.8% share, while the remaining market will be portioned among smaller players, such as Lenovo, Harman Kardon, and LG.
All the biggest names in technology are now competing for a small, circular piece of real estate on your coffee table (Sources: Reuters/Peter Hobson, Reuters/Beck Diefenbach, Reuters/Stephen Lam, and Harman Kardon)
However, the situation is expected to change significantly now that Apple has announced its pricey, high-end speaker, the HomePod. Microsoft is also on the way to join the race, but not with its own speaker; instead, it will be launching a Harman Kardon device called Invoke, powered by Cortana. Another noteworthy entrance to this market, made earlier this year, is Lenovo's Smart Assistant speaker, which will be a new host to Amazon's Alexa. Users will now be able to choose a product that is powered by Alexa's intelligent voice services, but in a device not made by Amazon. The sound system of the Lenovo speaker will also be by Harman Kardon.
Chinese-speaking smart speakers open the door to a fifth of the world's population
Meanwhile, in China, Alibaba announced its entrance into the intelligent home speaker market with the Tmall Genie X1, as seen in this promotional video.
The Tmall Genie X1It is expected to go on sale in August for the equivalent of about $75, which is much cheaper than the various US devices. However, there is already competition in China. One of Alibaba's competitors, the up-and-coming online retailer JD, has already collaborated with iFlytek and — last year — released the Linglong Dingdong series of smart speakers.
Chinese search engine giant, Baidu, together with hardware company, AiNemo, has built a slightly different take on the smart speaker. The device, called Little Fish, is powered by DuerOS, which is Baidu's voice activation OS. Differing from all the other products discussed until now, this device comes equipped with both a screen and a camera. The intelligent robot can track the user's face and use face recognition to authenticate online purchases. It can also display information and images in response to user queries.
Baidu's take on the smart speaker; with a screen and camera, it can track you around the room (Source: Baidu)
This is somewhat similar to the newest addition to Amazon's Echo product line, the Echo Show. The Show, as opposed to the Little Fish, cannot move, so it can't track you around the room. You can't even change the angle of the screen manually, so it's much more limiting, but it's the first U.S. smart speaker to have a screen. Thus, as we see more and more copycats of the original Amazon Echo, Amazon itself is moving on to new features.
Will the assistants' skill sets be the main differentiator?
It's still early in the game for this market, and already the similarity is astonishing. Except for slight design differences, there is hardly any diversity in the physical appearance of the devices. On the inside, there are some more interesting differences. The number of microphones varies from only two in the Google Home, which is the minimum for performing far field voice pickup, to eight in the Lenovo Assistant. The number and quality of speakers also varies, with upcoming offerings from Lenovo and Apple attempting to bring audio to a much higher level than Amazon and Google thanks to multi-channel tweeters, room correction, and audio beamforming.
The variance is much more pronounced in the brains behind the speakers. The artificial intelligence (AI) platforms are not all equal when it comes to their skill sets and the tasks they can perform. Alexa has already acquired more than 15,000 skills, far more than any contender. Google Home, in second place, has a mere 378, according to Voicebot. This could be a very strong selling point and differentiator going forward.
Another important aspect is the “smartness” of the underlying AI platform. The ability to be conversational, understand context, and answer follow-up questions, are good examples. This is a bit harder to measure than counting skills, but Google Assistant might be ahead of Alexa in this field. In general, search giants like Google and Baidu, which have access to enormous banks of data, have an advantage when it comes to deep learning, and this may give them an edge looking forward.
Until now, the AI platform and the physical speaker have gone together, but that is about to change. Since Amazon and Google have already opened their services to third party devices, the design of the speaker hardware itself is open to anyone. With an adequate solution for far field voice pickup, any device can connect to the personal assistant of choice through their API and offer intelligent voice services. This opens up abundant opportunities for silicon and device makers to build smart speaker hardware to address any market niche. Apple, as usual, is keeping Siri in their own hardware.
Creating a cost-effective low-power SoC solution
Considering all of the points presented above together, I believe this marks the onslaught of the second wave of smart speakers. The second wave is the cycle in which device manufacturers struggle to reduce cost and reach mass market volumes, using the same few personal assistant platforms. To do this, they must find the optimum balance between features and user experience while maintaining competitive pricing. At the same time, the market leaders will attempt to enhance the skills and intelligence of their platforms and introduce new features (like screens and cameras) with varying degrees of success. As in the smartphone platforms war of a decade ago, this will lead to many variations that, in the end, converge into the consumers' choice of the Darwinian fittest.
Now that the market is thriving, many more will join. The Lenovo speaker and the Invoke will be especially interesting because they mark the beginning of the split between the voice assistant and the consumer product that enables it. If you want to learn how to voice-enable your own devices, check out the CEVA-X2 DSP core for multi-mic far field voice pickup; also CEVA's Bluetooth (including BT 5) and Wi-Fi for ultra-low-power connectivity.
Moshe Sheier is Director of Strategic Marketing, CEVA. In this role, Moshe oversees corporate development and strategic partnerships for CEVA’s core target markets and future growth areas. Moshe is engaged with leading SW and IP companies to bring innovative DSP-based solutions to the market. In his spare time, Moshe rides mountain bikes and practices Aikido.