Max has already extolled the apparent virtues of Amazon's Echo appliance, but as I mentioned to him, this thing is a serious threat to luddites everywhere. Yes, if you're already used to saying “Siri…” or “ok google now..”, “Alexa” seems like yet another voice-activated assistant. Before I bought mine, I was interested to see that many reviewers felt it was barely worth the $99 intro price at the time. Yes, it offers better than average audio and comes in this nice unobtrusive package that at a glance just looks like an elegant flowerpot (I advise not watering it though), but what does it do?
This thing is nothing less than your own always-on entry point to cloud services. I mean that very literally. Amazon's Alexa AppKit (currently in beta) provides a set of services that allows developers to create their own apps that leverage Alexa voice services to respond to custom commands with custom actions.
That just sounds like another app, right? In fact, developers can already tie into Android standard system voice commands for wearables, phones, tablets, and other Android devices. Google is also in limited release with a custom voice capability so developers can let users “ok google now” into Android apps. Those are still one-way voice activations — not conversations. And Siri? We're still waiting, Apple. With interesting market timing, Google very recently released its Voice Interaction API, which is the closest thing to Amazon's Alexa AppKit services in offering the capability to let users not just invoke an app or provide basic voice input but also have a conversation with it.
There's a fundamental and very significant difference between Alexa AppKit services and the Voice Interaction API. Amazon wants to make sure developers understand this difference and one of the first things you see on the AppKit pages is the following notice in a large orange box:
Note: Developing apps with the Alexa AppKit is different than developing for other devices such as Android or iOS. Alexa apps are not installed on an actual device. Instead, they are web services hosted in the cloud. When a user wakes an Echo device and makes a request, that request is sent to the Alexa service in the cloud. If the request was intended for your app, the Alexa service sends a request to your app, waits for the response, and delivers the response to the user.
So this device that seems like just an interesting speaker system is essentially a physical extrusion of the cloud into your personal space.
That fact alone makes Echo compelling, but here's why it could help Amazon suddenly become a major player in app services. (Yes, Amazon already has a whole app services capability that includes Amazon Fire Phone, Fire Tablet, and Fire TV, none of which have particularly shaken the market at this point, running behind other market leaders in each category.) Echo is a mass market appliance into Amazon Web Services (AWS), which some time ago moved beyond just a Platform-As-a-Service capability, already offering 10x the computing capacity of its next 14 competitors combined. More to the point: Even the worst luddite would be comfortable using Echo to interact with a growing array of smart devices.
Echo already allows users to interact by voice with Belkin WeMo and Philips Hue connected home devices, but the Alexa AppKit would allow developers to slap a voice interface onto their own devices and software applications. The Alexa Service provides a voice-interaction-in-a-box capability, sending parsed user intent to your application and translating your application's response to voice output on the Echo. Like an old-fashioned telephone, Echo itself doesn't do much here — it “just” connects the user to the cloud rather than the POTS network. (When we someday talk about the plain old cloud service, could somebody please first come up with an expression that yields a nicer sounder acronym?)
Alexa Service flow overview (Courtesy of Amazon)
The Alexa Service can work with any web service and the Alexa AppKit provides a Java library for convenience. The part I find particularly interesting is that rather than building your own web service with all the required bells and whistles, you can also just use an AWS service called Lambda. Lambda is AWS's foray into event-driven service. Unlike its flagship Elastic Compute Cloud (EC2) virtual servers, Lambda instantiates an instance on receipt of an event and maintains the instance only as long as required for it to respond to that event. AWS wasn't the first to offer this type of service, and Lambda is a relatively new entry. When Lambda first previewed, I thought it was an interesting service but fairly specialized in its utility. Now I see that it's the underpinnings of Echo/Alexa. Lambda supports Java and node.js, the two programming languages currently supported in the Alexa AppKit, and it's easy to imagine the Alexa strategic plan had a lot to do with Lambda's appearance.
Within the Alexa AppKit application service, you define your app by providing a URL or ARN (Amazon Resource Name), pointing to your own hosted web service or AWS Lambda function, respectively. In addition, you provide a JSON-formatted “schema” that describes the “user intent” for the conversation and a description of sample utterances. The terms for “intent” provided in the schema correspond to entry points and variables within the web service specified by your URL or ARN. The details of all this are way beyond the scope of this little post but it will all look familiar to developers.
Alexa AppKit and the Service in general are tied deeply into the AWS world in the way the Service authorizes users to interact with Alexa apps and the way Alexa apps are allowed to interact with other resource endpoints. Within the AWS world, all this becomes particularly easy of course. I've been using AWS for many years so I'm familiar with Lambda and the AWS model and getting the Alexa AppKit version of “hello world” up with Lambda was very quick. It's kind of cool chatting with my much smarter Lambda self through Echo.
This is by no means a toy, however. The low-hanging fruit with this service is providing a voice interface to other popular event-driven application services such as IFTTT (there's already an IFTTT Alexa channel) and the myriad vendor-specific and third-party services out there. Beyond that, you know Echo interfaces are going to pop up for Arduino, Raspberry Pi, Beaglebone, Neopixels, and other connected hardware out there. The interesting thing will be if this becomes a voice hub for home automation and beyond that, if it literally becomes a voice of the IoT.