Design of a Wearable, Real-Time Mobile Gaze Tracker
An important aspect of on-body sensing is tracking the eye and visual field of an individual. Continuous real- time tracking of the state of the eye (e.g. gaze direction, eye movements) in conjunc- tion with the field of view of a user is profoundly important to understanding how humans perceive and interact with the physical world.
Real-time tracking of the eye is valuable in a variety of scenarios where rapid actuation or intervention is essential, including enabling new “hands-free" ways of interacting with computers or displays (e.g. gaming), detection of unsafe behaviors such as lack of attention on the road while driving, and leveraging visual context as a signal of user intent for context-aware advertising.
Continuous eye tracking is also useful in non real-time applications including market research to determine how customers interact with product and advertising placement in stores, and personal health, where the state of the eye provides a continuous window into Parkinson’s disease progression, psychiatric disorders, head injuries and concussions, and others.
While our understanding of the human eye and gaze has grown through decades of research on the topic, eye tracking re- mains limited to controlled user studies and clinical trials, and has not bridged the gap to daily consumer use. The central challenge is that the sensing and processing pipeline is extremely complex: the eye-facing imager alone requires continuous operation of a camera at tens of frames per second, and compute-intensive image processing for each frame.
In this paper we describe our key contribution: the design of an eye tracker (iShadow) that dramatically reduces the sensing and computation needs for eye tracking, thereby achieving orders of magnitude reductions in power consumption and form-factor.
The key idea is that eye images are extremely redundant, therefore we can estimate gaze by using a small subset of carefully chosen pixels per frame. We instantiate this idea in a prototype hardware platform equipped with a low-power image sensor that provides random access to pixel values, a low-power ARM Cortex M3 microcontroller, and a bluetooth radio to communicate with a mobile phone.
The sparse pixel-based gaze estimation algorithm is a multi-layer neural network learned using a state-of- the-art sparsity-inducing regularization function that minimizes the gaze prediction error while simultaneously minimizing the number of pixels used. Our results show that we can operate at roughly 70mW of power, while continuously estimating eye gaze at the rate of 30 Hz with errors of roughly 3 degrees.
To understand the design challenges, consider an eye tracker equipped with two cameras, one facing the eye and one facing the external world. A VGA-resolution eye facing imager sampled at 30Hz generates a data rate of roughly 4 Mbps. Continuous real-time processing of such an image stream would require compu- tational capability and memory comparable to a high-end smart- phone, making the eye tracker both bulky and power hungry. An alternative design might be to wirelessly stream data from the eye tracker to a smartphone for leveraging the phone or cloud-based computational resources.
However, the bandwidth requirements for such streaming is substantial — most low-power radios cannot support the demanding data rates of eye trackers, and streaming via WiFi is power-hungry and would greatly limit the lifetime of such a device. Perhaps as a result, many state-of-art eye trackers, such as the Tobii glass, operate as data recorders and contin- uously writes data to a disk that the subject carries in their pocket. (Google Glass, while not equipped with an eye tracker, has similar challenges - in continuous video capture mode, the device lasts for only a few hours.)
We argue that the current approach is fundamentally flawed — existing systems separate image acquisition from the eye state processing, and as a consequence are unable to leverage a variety of optimizations that are possible by a more holistic approach that uses application-driven sampling and processing.
Consider, for example, recently available smartphones such as the Samsung Galaxy S IV, which track gaze for eye scrolling; here, the entire image is acquired from the camera, after which it is processed through computer vision techniques to estimate gaze direction.
The thesis of our work is that by joint optimization of pixel acquisition and gaze estimation, we can enable real-time, continuous eye tracking while consuming only milliwatts of power, thereby enabling real- time continuous gaze based applications.
At the heart of iShadow is a simple idea: individual eye-facing images are extremely redundant, and thus it should be possible to estimate gaze location using only a small subset of pixel values in each frame. In other words — we can leverage knowledge that the imager is looking at the eye, and that the most useful information for gaze tracking is where the iris is located within the eye. Thus, we can estimate gaze coordinates accurately as long as we can ex- tract the location of the iris and sclera of the eye at low power.
To read this external content in full, download the complete paper from the open archives online at the University of Massachusetts. http://people.cs.umass.edu/~amayberr/ishadow-mobisys2014.pdf