Vision in wearable devices: Expanded application and function choices
Once-hot technology markets such as computers and smartphones are beginning to cool off; analyst firm IDC, for example, forecast earlier this year that smartphone sales will increase only 19% this year, down from 39% in 2013. IDC also believes that beginning in 2018, annual smartphone sales increases will diminish to single-digit rates. Semiconductor, software, and electronic systems suppliers are therefore searching for the next growth opportunities, and wearable devices are likely candidates.
Analyst firm Canalys, for example, recently forecast shipments of more than 17 million ‘smart band’ wearables this year, when Canalys predicts the product category will become a key consumer technology, and that shipments will expand to more than 23 million units by 2015 and over 45 million by 2017. In the near term, the bulk of wearable shipments will consist of activity trackers and other smart bands, point of view cameras, and smart watches, but other wearable product types will also become more common, including smart glasses and ‘life recorder’ devices.
These wearable products can be greatly enhanced (and in some cases are fundamentally enabled) by their ability to process incoming still and video image information. Vision processing is more than just capturing snapshots and video clips for subsequent playback and sharing; it involves automated analysis of the scene and its constituent elements, along with appropriate device responses based on the analysis results. Historically known as ‘computer vision’, traditionally vision processing has been the bailiwick of large, heavy, expensive, and power-hungry PCs and servers.
Now, however, although ‘cloud’-based processing may be used in some cases, the combination of fast, high-quality, inexpensive, and energy-efficient processors, image sensors, and software are enabling robust vision processing to take place right on your wrist (or your face, or elsewhere on your person), at price points that enable adoption by the masses. And an industry alliance comprised of leading technology and service suppliers is a key factor in this burgeoning technology success story.
Form factor alternatives
The product category known as ‘wearables’ comprises a number of specific product types in which vision processing is a compelling fit. Perhaps the best known of these, by virtue of Google’s advocacy, are the ‘smart glasses’ exemplified by Google Glass (Figure 1). The current Glass design contains a single camera capable of capturing 5 Mpixel images and 720p streams. Its base functionality encompasses both conventional still and video photography. But Google Glass is capable of much more, as both Google's and third-party developers' initial applications are making clear.
Figure 1: Google Glass has singlehandedly created the smart glasses market (a), for which vision processing-enabled gestures offer a compelling alternative to clumsy button presses for user interface control purposes (b).
Consider that object recognition enables you to comparison-shop, displaying a list of prices offered by online and brick-and-mortar merchants for a product that you're currently looking at. Consider that this same object recognition capability, in ‘sensor fusion’ combination with GPS, compass, barometer/altimeter, accelerometer, gyroscope, and other facilities, enables those same smart glasses to provide you with augmented reality information about your vacation sight-seeing scenes. And consider that facial recognition will someday provide augmented data about the person standing in front of you, whose name you may or may not already recall.
Trendsetting current products suggest that these concepts will all become mainstream in the near future. Amazon's Fire Phone, for example, offers Firefly vision processing technology, which enables a user to "quickly identify printed web and email addresses, phone numbers, QR and bar codes, plus over 100 million items, including movies, TV episodes, songs, and products."
OrCam's smart camera accessory for glasses operates similarly; intended for the visually impaired, it recognizes text and products, and speaks to the user via a bone-conduction earpiece. And although real-time individual identification via facial analysis may not yet be feasible in a wearable device, a system developed by the Fraunhofer Institute already enables accurate discernment of the age, gender, and emotional state of the person your Google Glass set is looking at.
While a single camera is capable of implementing such features, speed and accuracy can be improved when a depth-sensing sensor is employed. Smart glasses' dual-lens arrangement is a natural fit for a dual-camera stereoscopic depth-discerning setup. Other 3D sensor technologies such as time-of-flight and structured light are also possibilities.
And, versus a smartphone or tablet, smart glasses' thicker form factors are amenable to the inclusion of deeper-dimensioned, higher quality optics. 3D sensors are also beneficial in accurately discerning finely detailed gestures used to control the glasses' various functions, in addition to (or instead of) button presses, voice commands, and Tourette Syndrome-reminiscent head twitches.
Point of view (POV) cameras are another wearable product category that can benefit from vision processing-enabled capabilities (Figure 2). Currently, they're most commonly used to capture the wearer's experiences while doing challenging activities such as bicycling, motorcycling, snowboarding, surfing, and the like. In such cases, a gesture-based interface to stop and stop recording may be preferable to button presses that are difficult-to-impossible with thick gloves or when it is clumsy or impossible to use fingers.
Figure 2: The point of view (POV) camera is increasingly "hot", as GoPro's recent initial public offering and subsequent stock-price doubling exemplify (a). With both the POV camera and the related (and more embryonic) ‘life camera’, which has experienced rapid product evolution (b), intelligent image post-processing to cull uninteresting portions of the content is a valuable capability.
POV cameras are also increasingly being used in situations where wearer control isn't an option, such as when they're strapped to pets or mounted on drones. And the constantly recording, so-called ‘life camera’ is beginning to transition from a research oddity to an early-adopter, trendsetter device. In all of these examples, computational photography intelligence can render in final form only those images and video frame sequences whose content is of greatest interest to potential viewers, versus generating content containing a high percentage of boring or otherwise unappealing material (analogies to snooze-inducing slide shows many of us have been forced to endure by friends and family members are apt).
Vision processing that's 'handy'
A wrist-strapped companion (or competitor) to smart glasses is the ‘smart watch’, intended to deliver a basic level of functionality when used standalone, along with an enhanced set of capabilities in conjunction with a wirelessly tethered smartphone, tablet, or other device (Figure 3). While a camera-inclusive smart watch could act as a still or video image capture device, its wrist-located viewpoint might prove to be inconvenient for all but occasional use.
Figure 3: The Android-based Moto 360 is a popular example of a first-generation smart watch, a product category that will benefit from the inclusion of visual intelligence in future iterations.
However, in smart watches gesture interface support for flipping through multiple diminutive screens of information is more obviously appealing, particularly given that the touch screen alternative is frequently hampered by sweat and other moisture sources, not to mention non-conductive gloves. Consider, too, that a periodically polling camera, driving facial detection software, could keep the smart watch's display disabled, thereby saving battery life, unless you're looking at the watch.
Finally, let's consider the other commonly mentioned wrist wearable, the activity tracker, also referred to as the fitness band or smart band when worn on the wrist (Figure 4). A recently announced smartphone-based application gives an early glimpse into how vision may evolve in this product category. The app works in conjunction with Jawbone's Up fitness band to supplement the band's calorie-consuming measurements by tracking food (i.e. calorie) intake in order to get a fuller picture of fitness, weight loss, and other related trends.
Figure 4: Fitness bands (a) and other activity trackers (b) are increasingly popular wearable products for which vision processing is also a likely near-future feature fit.
However, Jawbone's software currently requires that the user truthfully and consistently enters the meal and snack information manually. What if, instead, object recognition algorithms were used to automatically identify items on the user's plate and, after also assessing their portion sizes, calorie counts? And what if, instead of running on a separate mobile device as is currently the case, they were to leverage a camera built directly into the activity tracker?