New standards are making the delivery of Web-based and
enhanced content alongside television a reality. This article describes the ATVEF enhanced television standard and the requirements for designing ATVEF-compatible receivers.
Despite its failings, the 1996 release of WebTV was the start of a revolution in Web surfing. For the first time, connecting to the Internet was as easy as using a television set and an infrared keyboard. WebTVs first Internet set-top product allowed users to surf the Web on their television screens, getting e-mail and reading news from their sofas and armchairs. Two years later, their WebTV Plus product improved on this paradigm, allowing users to simultaneously view Web content and television, with the broadcast signal embedded in the Web content, in a picture-in-picture window.
The convergence of computers and television has been predicted by technology analysts for many years. Instead of having both PCs and televisions, they suggest many consumers will eventually own only a single device that will have the widespread availability and ease-of-use of television, combined with the interactive power and flexibility of a PC. Along this path, we are already seeing both devices adopting features formerly reserved for the other.
From the PC side, computers are becoming more adept at handling video content. MPEG compression standards have allowed computers to display video; multicast IP and push technologies allow the TCP/IP protocols to simulate a broadcast infrastructure.
From the TV side, we are just starting to see the emergence of standards that allow Web-based content to be broadcast to your television. One of the most popular of these standards is the Advanced Television Enhancement Forum (ATVEF), a specification developed and supported by some of the biggest names in the broadcasting, computer, and consumer electronics industries.
The goal of this article is to discuss the future of television as an Internet appliance. Because ATVEF is one of the most promising standards in the enhanced television world,
its a good example of where Internet-enhanced TV is going. The bulk of this article will be geared toward describing the ATVEF standard and its technical implementation. For the sake of completeness, we will also discuss how ATVEF might be used in the coming years, its industry support, and its major competitors.
Content specification
In a nutshell, ATVEF is a standard for creating enhanced, interactive television content and delivering that
content to a range of television, set-top, and PC-based receivers. ATVEF defines the standards used to create enhanced content that can be delivered over a variety of media, including analog (NTSC) and digital (ATSC) television broadcasts, and a variety of networks, including terrestrial broadcast, cable, and satellite.
By defining the standards used to create enhanced content, the ATVEF specification also defines the minimum functionality required by ATVEF receivers to parse and display this content. One
of the major goals of ATVEF was to create a specification that relies on existing and prevalent standards, so as to minimize the creation of new specifications. Not surprisingly, the group chose to base their content specification on existing Internet technologies such as HTML and JavaScript.
Besides minimizing the number of standards that the ATVEF working group needed to create, forcing content creators to base their content on existing Internet technologies provides two other important benefits.
First, because the content specifications are fully Web-compatible, millions of pages of potential content already exist. And second, considering how easy it is to use many of todays Web-authoring tools, practically anyone can become an ATVEF content developer.
The ATVEF 1.0 Content Specification mandates that receivers support HTML 4.0, JavaScript 1.1, and Cascading Style Sheets. This is a minimum content specification because all receivers must support these standards, but they are allowed to support
others as wellJava and VRML, for example. Establishing a minimum content specification is important to content developers who want to produce the richest content possible, while ensuring that their content is available to the maximum number of viewers.
With ATVEFs membership being much greater on the side of content developers than on set-top box and TV manufacturers, its no surprise that the minimum standard provides for nearly the same feature set as the latest PC-based web browsers.
As more manufacturers consider adopting ATVEF, we are likely to see additional content specificationsperhaps an ATVEF Litethat provide less functionality at a reduced hardware and software cost. This is sure to please companies that design embedded systems, as the majority of embedded web browsers dont yet have the same level of content support as typical PC-based browsers.
Of course, including a Web browser on a television set introduces some possibilities for exciting new
content. To support these, the ATVEF specification calls for new extensions to the existing standards. The most prominent extension to HTML defined by the ATVEF specification is the addition of a tv: attribute. The tv: attribute specifies the insertion of the television broadcast signal into the content, and may be used in an HTML document anywhere that an image may be placed. Creating an enhanced content page that displays a television channel in some area of the page is as easy as
inserting an image into an HTML document.
In addition to defining what ATVEF content looks like, the specification also defines how the content gets from the broadcaster to the receiver, and how the receiver is informed that it has enhancements available for the user to access. The latter task is accomplished with triggers.
Triggers
Triggers are mechanisms used to alert receivers to incoming content enhancements. They are sent over the
broadcast medium and contain information about enhancements that are available to the user. Among other information, every trigger contains a standard Universal Resource Locator (URL) that defines the location of the enhanced content. ATVEF content may be located locallyperhaps delivered over the broadcast network and cached to a diskor it may reside on the Internet, another public network, or a private network.
Besides containing information about where the enhanced content is located, triggers
may also contain a human-readable description of the content. For example, a trigger may contain a description like, Press Browse for more information about this show
, that can be directly displayed by the receiver in order to provide information about the nature of the content to the user. Triggers may also contain expiration information to provide the receiver with contextual information about how long the content should be offered to the viewer and a checksum to ensure the integrity of
the delivered information.
Lastly, triggers may contain JavaScript fragments. These script fragments (often just single method calls) can trigger execution of JavaScript within the associated HTML page, and can be used for such things as synchronization of the enhanced content with the video signal and updating of dynamic screen data.
Transports
Besides defining how ATVEF content is displayed and how the receiver is notified of new
content, the specification also defines how content is delivered. Because your television or set-top box may or may not have a connection to the Internet, the ATVEF specification describes two distinct models for delivering content. These two content delivery models are commonly referred to as transports, and the two transports defined by ATVEF are referred to as Transport Type A and Transport Type B.
Transport Type A is defined for ATVEF receivers that maintain a connection (commonly called a back-channel or
return path) to the Internet. Generally, this network connection is provided by a dial-up modem but may be provided by any type of bi-directional access channel. Transport Type A is a method for delivering only triggers, without additional content. Because there is no content delivered with Transport Type A, all data must be obtained over the back-channel, using the URL(s) passed with the trigger as a pointer to the content.
Transport Type B provides for delivery of both ATVEF triggers and the associated
content via a broadcast network. In this model, the broadcaster pushes content to a receiver, which will store it in case the user chooses to view it. Transport Type B uses announcements sent over the network to associate triggers with content streams. An announcement describes a content stream and may include information regarding bandwidth, storage requirements, and language (enhancements may be delivered in multiple languages).
Since a Type B receiving device will, in most cases, need to store any
content that will be displayed, it uses announcement information to make content storage decisions. For instance, if a stream requires more storage space than a particular receiver has free, the receiver may elect to discard some older content, or it may elect not to store the announced stream. A drawback of this model is that if a person chooses to start watching a show near the end, there may not be time for the content to be streamed to the receiver, and the person will not be able to view some or all of
the content.
To review, the two types of ATVEF data are triggers and content. If the receiving device has a backchannel to the Internet, Transport Type A will broadcast the trigger only (akin to a URL), and content will be pulled over the Internet. If the receiving device doesnt have an Internet connection, Transport Type B allows both the triggers and content to be delivered over the broadcast channel.
Delivery protocols
The ATVEF specification also defines a reference protocol stack used for content delivery. While all of the high-level protocol layers are well-defined for every ATVEF implementation, the link layer and physical layer protocol layers are dependent on the broadcast network. This is obvious when you consider that it is not possible to transmit analog data over cable the same way you would transmit digital data over satellite. Figure 1 illustrates a standard ATVEF protocol stack for delivery of enhanced content.

Figure 1
For traditional bi-directional Internet communication, the Hypertext Transfer Protocol (HTTP) defines how data is transferred at the application level. But because one cant have a two-way connection over a broadcast medium, we require a unidirectional application-level protocol for data delivery. ATVEF defines this protocol to be the Unidirectional Hypertext Transfer Protocol (UHTTP). UHTTP is based on UDP, as opposed to TCP. This makes sense, of course, because UDP is a
connectionless protocol suitable for a broadcast network.
Like HTTP, UHTTP uses traditional URL naming schemes to reference content. Therefore, content creators can reference enhancement pages using the standard http: and ftp: naming schemes. To this, ATVEF adds the lid: or local identifier URL naming scheme. The lid: naming scheme allows content creators to reference content that exists locally (on the receivers memory or disk drive, for example) rather than
the Web.
With HTTP, as well as with many other Internet application protocols, the TCP layer provides error detection and re-transmission facilities. But for a unidirectional protocol, there is no possibility for retransmission requests. Thus, UHTTP must implement error correction without retransmission, sometimes called Forward Error Correction (FEC). Using sophisticated FEC algorithms, if the data is not too badly corrupted, it can be regenerated with only the received information. With their emphasis
on error correction instead of detection, the coding schemes used in unidirectional communications are more similar to the algorithms used in data storage like digital tapes and CD-ROMs, than those used in traditional bi-directional communications.
Bindings
How ATVEF data is delivered over a particular networkfrom the network layer protocol down to the physical layeris called the binding. In order for ATVEF to provide
interoperability between broadcast networks and receivers, its important that each physical network have only one binding. And it is equally important that each binding provide a fully comprehensive definition of the interface between the broadcast network specification and the ATVEF specification.
At this point, ATVEF has defined bindings for delivering data over IP multicast as well over NTSC. Because the transmission of IP is defined (or can be) for virtually every type of television broadcast network, the
binding to IP is considered the reference binding. So, defining an ATVEF binding for a new network could be as easy as describing how to run IP over that network.
Figure 1 illustrates the protocol stack for the reference binding.
ATVEF over NTSC
NTSC is the standard for analog television broadcasts in the U.S. Unless you have an HDTV set already, the televisions in your home are nothing but NTSC receivers. Part of
the NTSC standard defines a frame (image) as consisting of 525 horizontal lines, each line drawn (or scanned) left to right. During a screen scan, only every other line is drawn; therefore, it takes two full screen scans to draw a single frame.
Each time the electron gun in the televisions cathode ray tube finishes scanning a half-frame, it must return to the upper left-hand corner of the television screen to prepare for the next half-frame. This takes a non-trivial amount of time, so each movement
of the electron gun must be re-synchronized with the incoming signal. This is done by adding a set of unused lines of data to the end of each screen scan, giving the electron gun time to return to its starting position. These 21 extra lines make up what is called the vertical blanking interval (VBI). (If you want to see the VBI for yourself, fiddle with the vertical hold knob on your TV and look for a horizontal black stripe across your screen.)
As it turns out, only the first nine lines of
the VBI are actually required to reposition the cathode ray. This leaves 12 more lines that can be used to broadcast data. In fact, in the U.S., closed captioning data has been broadcast on line 21 for many years. Each line of the VBI has a transmit rate of about 17K/sec. So in theory, the VBI associated with each NTSC-encoded television channel could carry up to 204K/sec (12 lines at 17K/sec per line) of piggy-backed data. However, after taking into account the overhead associated with the various protocol
layers and the need to prevent conflicts with closed captioning and other data already broadcast within the VBI space, the maximum achievable rate for ATVEF data transmission is somewhat lower than thisprobably around 100K/sec.
Transport Type A
. The Type A transport binding for NTSC is easy to describe. ATVEF triggers are simply broadcast in line 21 of the VBI. For purposes of data integrity, the NTSC binding for Transport Type A requires that each trigger contain a checksum. The binding
also recommends that the trigger length not exceed 25% of the total bandwidth of the line, in order to avoid conflicts between triggers, closed captioning data, and data from any future services that might also use line 21.
While ATVEF triggers could have been placed on some other line of the VBI, placing them on line 21 has advantages for receiver manufacturers. For example, most standard NTSC video decoder chips already have the ability to extract line 21 of the VBI (for closed captioning support). By
placing triggers in that same line, hardware manufacturers are not forced to upgrade to more expensive decoders that support data extraction in other lines of the VBI.
Transport Type B
. In addition to sending triggers on line 21 of the VBI, the Transport Type B NTSC binding includes a mechanism for delivering IP datagrams over the other VBI lines. IP over VBI (IP/VBI) is an Internet Draft of the Internet Engineering Task Force (IETF). As such, IP/VBI is not yet a standard, just a work in progress. Therefore, some details of some of the encapsulation, compression, and error detection schemes may change, but the architecture is unlikely to change radically. Figure 2 illustrates the protocol stack defined by ATVEF down to the IP layer, and defined by IP/VBI below that.

Figure 2
At the bottom of the stack is the NTSC television standard. At the lowest level, the television signal transports NABTS (North American Basic Teletext Standard) packets. NABTS is a method of modulating data onto the VBI. A typical NABTS packet gets encoded onto a single VBI line. NABTS, by way of its own forward error correction, supports correction of all single-bit, double-bit, and single-byte errors, as well as the ability to regenerate an entire missing packet. The NABTS packets are removed from the VBI to form a sequential data stream. This data streamencapsulated in a SLIP-like protocolis unframed to produce IP packets, which are handled equivalently across all ATVEF network types that implement the IP reference binding.
As you can see, a specific network binding is not complicated, but is detailed enough (the full IP/VBI draft standard is obviously much more detailed than what weve presented) that anyone creating a broadcast network or building an ATVEF receiver has enough information to make their design ATVEF-compliant. And, while weve only presented the NTSC binding here, there areor soon will bewell defined ATVEF bindings to every other major video network
standard, including PAL and SECAM (the European counterparts to NTSC), ATSC (digital terrestrial broadcast), cable, and satellite.
Design issues
Even for those intimately familiar with the specification, implementing an ATVEF receiver is not a trivial chore. Because the specification is flexible with respect to many of the implementation details, the embedded software developersand, in some cases, the hardware designershave to
determine exactly how the receiver will be integrated with the television or the rest of the set-top box.
The first major decision when designing an ATVEF receiver is whether to support Transport Type A or B. Often, this decision is driven by the type of network the receiver will be connected to. For a satellite television set-top box that provides no backchannel to the Internet, the obvious decision is to support Type B. But for a cable television set-top box that doubles as a cable modem with dedicated
Internet access, it may be okay to support only Type A. Of course, choosing to support a high-bandwidth option like Transport B will also require additional hardware and/or software performance.
As a typical example, lets suppose that we were building a set-top box that would serve as an NTSC receiver with ATVEF support. Assuming the standard NTSC binding for Type BNABTS encoding of the data in the VBIwe must decide how we will decode this data when received. The most obvious choice is to
use an NTSC video decoder that will parse all VBI lines in hardware. But, as we mentioned earlier, while some of the higher-end decoders support this functionality, these decoders tend to be a little more expensive; when building millions of set-top boxes, every penny saved can make a big difference in the bottom line.
The other option is to do the NABTS decoding in software. Unfortunately, software decoding is processor intensive. In fact, some benchmarks have indicated typical VBI decoding requires up
to 2% of a Pentium-class 166MHz processor per VBI line. For full decoding of VBI lines 10 through 20, this would require about 20% of that same processors time. Of course, these specific issues are only related to NTSC receivers. ATVEF receivers on digitalor non-NTSC analognetworks have a whole set of different issues that must be addressed.
Another major design issue that developers must consider is user interface. By design, the ATVEF specification puts no restriction on how triggers
and data are presented to the user. Implementers must decide how these things are done. For example, it seems reasonable that the user should be able to decide if he would like to receive indication of incoming content or not. But nothing in the specification dictates that implementers must allow users to turn off enhancements.
So what?
So now youre wondering, What is this enhanced TV stuff going to do for me? The most
obvious answer, unfortunately, is that it is going to try to entice you to spend money. For decades, the television has been used to solicit your hard-earned cash through a seemingly endless stream of commercials. But, despite the annoying jingles that we cant get out of our heads, the slogans that pervade pop-culture, and the famous spokespersons who just wont go away, television advertising has always lacked the ability to complete the transaction. Never before has television advertising
had the means to allow the viewer to make a spontaneous purchase, to buy with the click of a button. Now it does.
That doesnt just mean that every commercial will include a BUY ME NOW button. It also means that youll be able to make purchases during your favorite shows. Clicking on Dan Marinos football jersey during Monday Night Football may pop up a description of the collectible garment, with an opportunity to purchase one right away. Or, during that same Monday Night
Football game, the network may offer you the option to receive alternate camera footage, live from the home teams clubhouse or from the camera on the referees shirtfor a fee, of course.
But lets not just focus on the advertising; enhanced television has the ability to improve your viewing experience as well. Imagine interactive game shows, where the contestants are chosen during the show to participate directly from their living rooms. Or youre watching MTV, and with the click of
a button, are finally able to get the lyrics to that ridiculous song you cant stop humming. Imagine choose-your-own-ending television shows where viewers have the option to vote on which of a variety of outcomes will happen.
And not only will television provide enhanced content, it will also have the means to provide personalized content. Take regular NTSC broadcasts, for example. VBI data (and hence ATVEF data) can be added to an NTSC signal at any point, and even more than one point, in the path
the signal travels from the broadcaster to the receiver. Therefore, a broadcaster could insert ATVEF content on a national scale, a local cable operator could add ATVEF content relating to local markets, and an automated profiler in your receiver can figure out which specific content would most appeal to you, and display it. National news broadcasters will now have the ability to provide local headlines, or better yet, headlines that appeal specifically to you.
Industry support and competing standards
Enhanced television is not a new idea. For decades, companies have built visions of enhanced television and tried to sell their visions to advertisers and consumer electronics manufacturers. However, none of these proprietary systems has caught on. Enhanced television has a chicken and egg problembroadcasters are reluctant to invest in enhanced television content and infrastructures before the consumer electronics companies can
guarantee a reasonably sized audience. And the consumer electronics companies find it difficult to sell enhanced TV receivers without the support of the broadcasters, who must provide enhanced content.
Today, two main standards compete in the area of enhanced television: ATVEF and Broadcast HTML. Broadcast HTML was created from ATSC-related work to develop the DTV Application Software Environ-ment (DASE). Its a combination of an XML-based subset of HTML 4.0, along with a Java Virtual Machine and
Suns PersonalJava API.
Both standards have significant industry support, and neither is likely to disappear soon. That leaves broadcasters, hoping to avoid a prolonged VHS vs. Beta fight, worried. Many are looking to the ATVEF and DASE members to reconcile their differences, or provide a minimum level of interoperability between the two standards.
Some companies are not waiting for the standards to settle. CNN, Discovery Channel, and HBO are among a handful of broadcasters already
delivering enhanced content on a regular or semi-regular basis. In fact, each week over 1,000 hours of network, syndicated, and cable TV programming include content enhancements. Consumer electronics companies are designing their next-generation set-top boxes to comply with enhanced television specifications. And embedded web browser companies are already providing enhanced television support in their browsers. You can be sure that enhanced television is coming.
Jason Steinhorn is an embedded software
engineer at Hughes Network Systems in Gaithersburg, MD. He is currently designing and developing a Web-enabled satellite television set-top box. Jason can be reached at
jsteinhorn@hns.com.
Mark Kohler develops software for broadband network equipment at Nomadix, in Santa Monica, CA. He can be reached at
mkohler@nomadix.com.
Acknowledgments
The authors wish to thank David Mott of Liberate Technologies for reviewing this article for technical accuracy. David has served on the ATVEF Technical Working Group since its inception. He can be reached at
mott@liberate.com.