ABOUT THE AUTHOR
PanFeng is currently a research fellow in the Center for SignalProcessing, Nanyang Technological University, Singapore. Hisexperience includes 15 years of teaching and research in digitalimage processing and video engineering. He has offered numeroustraining courses for industry in these areas.
Digital television (DTV) is a new type ofbroadcasting technology that will globally transform television aswe now know it. DTV refers to the complete digitization of the TVsignal from transmission to reception. By transmitting TV picturesand sounds as data bits and compressing them, a digital broadcastercan carry more information than is currently possible with analogbroadcast technology. This will allow for the transmission ofpictures with HDTV resolution for dramatically better picture andsound quality than is currently available, or of several SDTVprograms concurrently. The DTV technology can also providehigh-speed data transmission, including fast Internet access.
This article is an introduction to the principles behind digitaltelevision broadcasting, including audio/video coding andmultiplexing, data scrambling and conditional access, channelcoding, and digital modulations. The article also compares thethree major DTV standards: ATSC-T, DVB-T, and ISDB-T.
Figure 1: Diagram of the DTTB system
The block diagram of a digital-terrestrial-televisionbroadcasting system (DTTB) is shown in Figure 1 . The video,audio and other service data are compressed and multiplexed to formelementary streams. These streams may be multiplexed again with thesource data from other programs to form the MPEG-2 Transport Stream(TS). A transport stream consists of Transport Packets that are 188bytes in length.
The FEC encoder takes preventive measures to protect thetransport streams from errors caused by noise and interference inthe transmission channel. It includes Reed-Solomon coding, outerinterleaving, and convolutional coding. The modulator then convertsthe FEC protected transport packets into digital symbols that aresuitable for transmission in the terrestrial channels. Thisinvolves QAM and OFDM in DVB-T and ISDB-T systems, or PAM and VSBin ATSC-T. The final stage is the upper converter, which convertsthe modulated digital signal into the appropriate RF channel. Thesequence of operations in the receiver side is a reverse order ofthe operations in the transmitter side.
Data compression technology makes digital televisionbroadcasting possible with a smaller frequency bandwidth than thatof an analog system. Among the many compression techniques, MPEG isone of the most accepted for all sorts of new products andservices, from DVDs and video cameras to digital televisionbroadcasting. The MPEG-2 standard supports standard-definitiontelevision (SDTV) and high-definition television (HDTV) videoformats for broadcast applications.
MPEG video compression exploits certain characteristics of videosignals, namely, redundancy of information both inside a frame(spatial redundancy), and in-between frames (temporal redundancy).The compression also removes the psychovisual redundancy based onthe characteristics of the human vision system (HVS) such that HVSis less sensitive to error in detailed texture areas and fastmoving images. MPEG video compression also uses entropy coding toincrease data-packing efficiency.
Figure 2: DCT-based intraframe coding
The intraframe coding algorithm (Figure 2 ) begins bycalculating the DCT coefficients over small non-overlapping imageblocks (usually 8×8 in size). This block-by-block processing takesadvantage of the image's local spatial correlation properties. TheDCT process produces many 2D blocks of transform coefficients thatare quantized to discard some of the trivial coefficients that arelikely to be perceptually masked. The quantized coefficients arethen zigzag scanned to output the data in an efficient way. Thefinal step in this process uses variable length coding to furtherreduce the entropy.
Figure 3: Motion-compensated interframe coding
Interframe coding (Figure 3 ), on the other hand, exploitstemporal redundancy by predicting the frame to be coded from aprevious reference frame. The motion estimator searches previouslycoded frames for areas similar to those in the macroblocks of thecurrent frame. This search results in motion vectors (representedby x and y components in pixel lengths), which the decoder uses toform a motion-compensated prediction of the video. Themotion-estimator circuitry is typically the most computationallyintensive element in an MPEG encoder (Figure 4 ).Motion-compensated interframe coding, therefore, only needs toconvey the motion vectors required to predict each block to thedecoder, instead of conveying the original macroblock data, whichresults in a significant reduction in bit-rate.
Figure 4: Block diagram of an MPEG-2 videocompression system
Unlike video, the three current DTVstandards use three different audio coding schemes: Dolby AC-3 forATSC, MPEG audio and Dolby AC-3 for DVB, and MPEG-AAC for ISDB.However, these audio standards use a similar technique calledperceptual coding and support up to six channelsright, left,center, right surround, left surround, and subwooferoftendesignated as 5.1 channels. A perceptual audio coder exploits apsycho-acoustic effect known as masking (Figure 5 ). Thispsycho-acoustic phenomenon states that when sound is broken intoits constituent frequencies, those sounds with relatively lowerenergy adjacent to others with significantly higher energy aremasked by the latter and are not audible.
Figure 5: Audio perceptual masking
AC-3 is one of the most popular audio compression algorithmsused in DTV, movie theater, and home theater systems. AC-3 makesuse of the psycho-acoustic phenomenon to achieve great datacompression. In the encoding process (Figure 6 ), a modifiedDCT algorithm transforms the audio signal into the frequencydomain, which generates a series of frequency coefficients thatrepresent the relative energy contributions to the signal of thosefrequencies.
Figure 6: Dolby AC-3 audio coding block diagram
By analyzing the incoming signal in the frequency domain,psycho-acoustically masked frequencies are given fewer (or zero)bits to represent their frequency coefficients; dominantfrequencies are given more bits. Hence, besides the coefficientsthemselves, the decoder must receive the information that describeshow the bits are allocated so that it may reconstruct the bitallocation. In AC-3, all of the encoded channels draw from the samepool of bits, so channels that need better resolution can use themost bits.
The output coefficients generated by the time-domain tofrequency-domain transformation are typically represented in ablock floating-point format to maintain numeric fidelity. Using theblock floating-point format is one way to extend the dynamic rangein a fixed-point processor. It is done by examining a block of(frequency) samples and determining an appropriate exponent thatcan be associated with the entire block. Once the mantissas andexponents are determined, the mantissas are represented using thevariable bit-allocation scheme described above; the exponents areDPCM coded and represented with a fixed number of bits (Figure6 ).
MPEG audio is a type of forward adaptive bit allocation, whileAC-3 uses hybrid adaptive bit allocation, which combines both theforward and backward adaptive bit allocation. The main advantage ofMPEG audio is that the psycho-acoustic model resides only in theencoder. When the encoder is upgraded, legacy decoders continue todecode newly coded data. However, the disadvantage is that it couldhave a heavy overhead for complicated music pieces.
Audio and video encoders deliver elementary stream outputs.These bit streams, as well as other streams carrying other privatedata, are combined in an organized manner and supplemented withadditional information to allow their separation by the decoder,synchronization of picture and sound, and selection by the user ofthe particular components of interest. This is done throughpacketization specified in MPEG-2 systems layer. The elementarystream is cut into packets to form a packetized elementary stream(PES). A PES starts with a header, followed by the content of thepacket (payload) and the descriptor. Packetization provides theprotection and flexibility for transmitting multimedia steamsacross the different networks. In general, a PES can only containthe data from the same elementary stream.
Elementary, Packetized Elementary, and TransportStreams
In broadcasting applications, a multiplex usually contain differentdata streams (audio and video) that might even come from differentprograms. Therefore, it is necessary to multiplex them into asingle streamthe transport stream. Figure 7a shows theprocess of multiplexing. A transport stream consists offixed-length transport packets, each exactly 188 bytes long. Theheader contains important information such as the synchronizationbyte and the packet identifier (PID). PID identifies a particularPES within the multiplex.
Figure 7: (a) The process of multiplexing.(b) The structure of a transport packet.
It is necessary to include additional program-specificinformation (PSI) within each transport stream in order to identifythe relationship between the available programs and the PID oftheir constituent streams. This PSI consists of the four tables:program associate table (PAT), program map table (PMT), networkinformation table (NIT), and conditional access table (CAT).
Within a transport stream, the reserved PID of 0 indicates atransport packet that contains a PAT. The PAT associates aparticular PID value with each program that is currently carried inthe transport multiplex. This PID value identifies the PMT for thatparticular program. The PMT contains details of the constituentelementary streams for the program. Program 0 has a special meaningwithin the PAT and identifies the PID of the transport packets thatcontains the optional NIT. The contents of the NIT are private tothe broadcaster and are intended to contain network-specificinformation. The CAT is identified by a PID of 1 and containsinformation specific to any conditional access or scramblingschemes that are in use.
Navigating an MPEG-2 Multiplex
MPEG-2 PSI tables only give information concerning the multiplex.The DVB standard adds complementary tables (DVB-SI) to allow theuser to navigate the available programs and services by means of anelectronic program guide (EPG). DVB-SI has four basic tables andthree optional tables to serve this purpose. The decoder mustperform the following main steps in order to find a program or aservice in an MPEG-2 transport multiplex.
- As soon as the new channel is acquired (synchronized), thedecoder must filter the PID 0 packets to acquire the PAT sectionsand construct the PAT to provide the available choice (servicescurrently available on the air) to the user
- Once the user choice is made, the decoder must filter the PIDcorresponding to the PMT of this program and construct the PMT fromthe relevant sections. If there is more than one audio or videostream, the user should be able to make another choice.
- The decoder must filter the PID corresponding to thischoice.
The audio/video decoding can now start. The part of this processthat is visible to users is the interactive presentation of the EPGassociated with the network, which can be built by means of the PSIand DVB-SI tables in order to allow them to easily navigate theavailable programs and services. Similar tables, Program and SystemInformation Protocol (PSIP) tables, are also available in the ATSCsystem.
DTV services will either be pay-per-view or at least includesome elements that are not freely available to the public. DVBdefined a standard for a “Common Interface for Conditional Accessand other Digital Video Broadcasting Decoder Applications” toenable an Integrated Receiver Decoder (IRD) to de-scramble programsbroadcast in parallel, using different conditional access (CA)systems. By way of inserting a PCMCIA module into the commoninterface, you can sequentially address different CA systems bythat IRD. MultiCrypt describes the simultaneous operation ofseveral CA systems. The MultiCrypt approach has the additionaladvantage that it does not require agreements between networks, butit is more expensive to implement. Other applications, such asEthernet connection or electronic commerce, may also utilize theDVB-CI connector.
SimulCrypt is another way of providing the viewer with access toprograms. In this case, commercial negotiations between differentservice providers have led to a contract that enables the viewer touse the one specific CA system built into the IRD to watch all theprograms, irrespective of the fact that these programs werescrambled under the control of different CA systems. At the moment,DVB supports both MultiCrypt and SimulCrypt, while ATSC onlysupports the later.
The transmission channels used for digital televisionbroadcasting are, unfortunately, rather error-prone due to a lot ofdisturbances (such as noise, interference, and echoes). However, adigital TV signal, after almost all its redundancy is removed,requires a very low bit error-rate (BER) for good performance. ABER of the order of 10-10 corresponds to an averageinterval of some 30 minutes between errors. Therefore it isnecessary to take preventive measures before modulation in order toallow detection and, as far as possible, correction in the receiverof most errors introduced by the physical transmission channel.These measures are called, collectively, forward error correction(FEC). FEC requires that redundant data is added to the originaldata prior to transmission, allowing the receiver to use theseredundant data to detect and recover the lost data caused by thechannel disturbance.
Figure 8: Forward error correction codingprocess
Figure 8 illustrates the successive steps of the forwarderror correction encoding process used in digital televisionbroadcasting. Strictly speaking, energy dispersal is not part ofthe error correction process. The main purpose of this step is toavoid long strings of 0s or 1s in the transport stream, in order toensure the dispersal of energy in the channel. Broadcastingstandards often use the terms inner coding and outer coding. Innercoding operates just before the transmitter modulates the signaland just after the receiver demodulates the signal. Outer codingapplies to the extreme input and output ends of the transmissionchain. Inner coding is usually convolutional in nature, withoptimal performance under conditions of steady noise interference.Outer coding is a Read-Solomon code that is usually more effectivefor correcting burst errors.
Outer coding is a Reed-Solomon code that is a subset of BCH cyclicblock codes. As its name implies, in block coding, a block of bitsis processed as a whole to generate the new coded block. It doesnot have system memory, such that coding of a data word does notdepend on what happens before or after that data occurs.Reed-Solomon code, in combination with the Forney convolutionalinterleaving that follows it, allows the correction of burst errorsintroduced by the transmission channel. It is applied individuallyto all the transport packets in Figure 7a , excluding thesynchronization bytes. R-S codes have been recently proved tooperate at the theoretical limit of correcting efficiencynomore efficient code can be found. This is why it has been chosenfor all DTV standards as outer coding. An R-S code is characterizedby three parameters (n, k, t ) where n is the size of the block after coding, k is the sizeof the block before coding and t is the number ofcorrectable symbols. Whether the received codeword is error-freecould be checked through a division circuit corresponding to thegenerate polynomial g(x) . For a proper codeword, theremainder is zero. In the event that the remainder is non-zero, aEuclidean algorithm is used to decide the two values needed forerror correction: the location of the error and the nature of theerror. However if the size of the error exceeds half the amount ofredundancy added, the error cannot be corrected.
In the ATSC standard, we find the R-S(207,187,10) code. It adds20 parity bytes and can correct up to 10 erroneous bytes perpacket. In the DVB and ISDB standards, we find the R-S(204,188,8)code. It adds 16 parity bytes and can correct up to 8 erroneousbytes per packet.
The purpose of data interleaving is to increase the efficiency ofthe Reed-Solomon coding by spreading over a longer time the bursterrors introduced by the transmission channel, which couldotherwise exceed the correction capacity of the Reed-Solomoncoding. Interleaving is normally implemented by using atwo-dimensional array buffer, such that the data enters the bufferin rows and then read out in columns. The result of theinterleaving process is that a burst of errors in the channel afterdeinterleaving becomes a few scarcely spaced single-symbol errors,which are more easily correctable.
The interleaver employed in the ATSC standard is a52-data-segment (intersegment) convolutional byte interleaver.Interleaving is provided to a depth of about 1/6 of a data field (4ms deep). Only data bytes are interleaved. The interleaver is alsosynchronized to the first data byte of the data field. Intrasegmentinterleaving is also performed for the benefit of the trelliscoding process. DVB and ISDB use convolutional interleaving, andthe interleaving depth is 12.
The inner coding is a 2/3 trellis coding for ATSC, andconvolutional coding for DVB and ISDB. Inner coding is an efficientcomplement to the Reed-Solomon coding and Forney interleaving as itis designed to correct random errors.
ATSC Trellis Coding
The 8-VSB transmission system employs a 2/3 rate (R=2/3) trelliscode, with one unencoded bit that is precoded. In creating serialbits from parallel bytes, the MSB is sent out first: (7, 6, 5, 4,3, 2, 1, 0). The MSB is precoded (7, 5, 3, 1) and the LSB isfeedback convolutional encoded (6, 4, 2, 0). Standard four-stateoptimal Ungerboeck codes are used for the encoding (Figure9 ); also shown are the precoder and the symbol mapper.
Figure 9: 2/3 trellis coding and precoder
You can use trellis coding with multi-level signaling, in otherwords, several multi-level symbols are associated into a group. Thewaveform that results from a particular group of symbols is calleda trellis. If each symbol can have eight levels, then in threesymbols there can be 512 possible trellises. In trellis coding, thedata are coded such that only certain trellis waveforms representvalid data. If only 64 of the trellises represent error-free data,then two data bits per symbol can be sent instead of three. Theremaining bit is a form of redundancy because trellises other thanthe correct 64 are due to errors. If a trellis is received in whichthe level of one of the symbols is ambiguous due to noise, theambiguity can be resolved because the correct level is the one thatgives a valid trellis. This technique is known asmaximum-likelihood decoding. The 64 valid trellises should be madeas different as possible to make the system continue to work with apoorer signal to noise ratio. If the trellis coder makes an error,the outer code will correct it.
DVB Convolutional Coding and Puncturing
In DVB, convolutional coding is used, followed by code puncturing.Typically, a 1/2 convolutional consists of two FIR filters. Thesetwo FIR filters convolve with the input bit stream, which producestwo outputs that represent different parity checks on the inputdata so that bit errors can be corrected. Clearly, there will betwo output bits for every input bit; therefore the code rate is1/2. Any rate between 1/1 and 1/2 would still allow thetransmission of original data, but the amount of redundancy wouldvary. Failing to transmit the entire 1/2 output is calledpuncturing and it obtains any required balance between bit rate anderror correcting capability. In DVB systems, as well as in ISDBsystems, 1/2, 2/3, 3/4, 4/5, 5/6, 7/8 are all possible coderates.
Until now we do not see much difference among the three DTVsystems. Differentiation occurs due to the different modulationschemes of the systems. This section briefly describes principlesbehind those modultion schemes.
ATSC 8-VSB System
The ATSC 8-VSB system was developed by the Advanced TelevisionSystems Committee in the U.S. The framing structure of thetransmitted signal is an important aspect of the ATSC standard. Itaccommodates the transport stream requirements, as well asmitigates channel inter-propagation effects such as multipath andimpulse noise.
The transport packet for ATSC consists of 188 bytes, including async byte. At the transmitter, this is altered in two ways. Firstthe sync byte is stripped off, leaving 187 bytes to be transmitted.Then 20 bytes are added to this for the the Reed-Solomon errorcorrection, giving 207 bytes transmitted in each packet, whichamounts to 1656 bits. The trellis coding at rate 2/3 increases thisto 2484 bits, or 828 symbols, since eight-level coding gives threebits per symbol. A special waveform, known as the data segmentsync, is added to the head of this packet and occupies four normalsymbol periods. The total modified transmission stream packet nowoccupies 832 symbol periods, or a total time of 77.3 µs at thesymbol rate of 10.76 megasymbols per second. This resulting newdata packet is now called a data segment.
Figure 10: VSB data segments and framingstructure
Periodically, at intervals of 313 packets or 24.2ms, a specialdata segment known as a field sync is inserted. The field synccarries training data used by the adaptive equalizer in thereceiver to estimate what echoes may be present due to multipathinterference. The form of the data segment and overall framingstructure is shown in Figure 10 .
Figure 11: Nominal VSB channel occupancy
The eight-level symbols combined with the binary data segmentsync and data field sync signals are used to generate asuppressed-carrier-modulate carrier. Before transmission, however,most of the lower sideband is removed. The resulting spectrum isflat, except for the band edges where a nominal square-rootraised-cosine response results in 620 kHz transit bands. Thenominal VSB transmission spectrum is shown in Figure 11 . Thespectrum includes a small pilot signal at the suppressed carrierfrequency, 310 kHz from the lower band edge.
DVB-T OFDM System
A European consortium of public and private sectororganizationsthe Digital Video BroadcastingProjectdeveloped the DVB-T OFDM system. The system uses alarger number of carriers-per-channel modulated in parallel via anFFT process, a technique referred to as orthogonal frequencydivision multiplex (OFDM). In case of multipath interference,echoes could cause severe interference to the main signal.Therefore, long symbol duration is necessary to suppress the echointerference. OFDM can achieve long symbol duration within the samebandwidth using parallel modulation. In OFDM, symbols aredemultiplexed to modulate many different carriers (a few thousand),each of which occupies a much narrower bandwidth. Hence, the symbolduration could be increased, though the total bandwidth remains thesame. These carriers are chosen to be orthogonal to each other sothat they are separable in the decoder. The modulated symbols arefrequency multiplexed to form the OFDM baseband signal, which isthen up-converted to RF signal for transmission.
The OFDM transmission system allows the selection of differentlevels of QAM modulation. Moreover, a guard interval withselectable width (1/4, 1/8, or 1/16 of the symbol duration)separates the transmitting symbols, which gives the system anexcellent capability for coping with multipath distortion. OFDMmodulation also supports a single frequency network, such that inthe single coverage area, multiple transmitters are used totransmit the same data using the same frequency at the same time.The DVB-T system can operate in either a 2k mode or 8k mode. The 2kmode uses a maximum of 1705 carriers, while in 8k mode the carriernumber is 6817. The 2k mode system has short symbol duration, so itis suitable for a small single-frequency network (SFN) network withlimited distance between transmitters. The 8k mode is used in alarge SFN network where the transmitters could be up to 90 kmapart.
ISDB-T BST-OFDM System
The Association of Radio Industries and Businesses (ARIB) in Japandeveloped the ISDB-T system. It uses a modulation method referredto as Band Segmented Transmission (BST) OFDM, which consists of aset of common basic frequency blocks called BST-Segments. Eachsegment has a bandwidth corresponding to 1/14th of the channelbandwidth. BST-OFDM provides hierarchical transmission capabilitiesby using different punctured coding rates, modulation schemes, andguard intervals on different BST-segments. Thus different segmentscan meet different service requirements. By transmitting OFDMsegment groups with different transmission parameters, you gethierarchical transmission.
Generally speaking, each system has its own unique advantagesand disadvantages. Table 1 summarizes the maincharacteristics of the three DTV systems.
or Dolby AC-3
or AAC Audio
|Channel Coding Coding|
7/8, constraint length=7, polynomials 171, 133
& frequency interleaving
time and frequency interleaving
|No. of Carriers|