As digital video continues to extend visual communication to anever-larger range of applications, more developers are becominginvolved in creating new video systems or enhancing the capabilities ofexisting ones.
Among the basic design considerations video developers face is thatthe high degree of compression involved demands a high level ofperformance from the processor. In addition, the wide range of videoapplications requires performance to be optimized to meet systemrequirements that can vary widely in terms of transmission bandwidth,storage, image specifications and quality requirements.
Among the available solutions, programmable digital signal processors (DSPs)offer the high level of real-time performance required for compression,as well as flexibility that enables systems engineers to adapt theencoding software readily to individual applications.
The goal for video compression is to encode digital video using asfew bits as possible while maintaining acceptable visual quality. Whileencoding algorithms are based on the mathematical principles ofinformation theory, they often require implementation trade-offs thatapproach being an art form.
Well designed encoders can help developers make these trade-offsthrough innovative techniques and support of the options offered byadvanced compression standards. A configurable video encoder that isdesigned to leverage the performance and flexibility of DSPs through astraightforward system interface can help systems engineers optimizetheir products easily and effectively.
Key factors in compression
Like JPEG for stillimages, the widely used ITU andMPEG video encoding algorithmscan employ a combination of discrete transform coding (DCT), quantizationand variable-lengthcoding to compress macro-blocks within a frame(intra-frame).
Once the algorithm has established a baseline intra-coded (I) frame,a number of subsequent predicted (P) frames are created by coding onlythe difference in visual content or residual between each of them. Thisinter-frame compression is achieved using a technique called motioncompensation in which the algorithm first estimates where themacro-blocks of an earlier reference frame have moved in the currentframe, then subtracts and compresses the residual.
|Figure1. Motion Compensation-Based Video Encoding|
Figure 1 above shows theflow of a generic motion compensation-based video encoder. Themacro-blocks typically contain four 8×8-pixel luminance blocksand two 8×8-pixel chrominance blocks (YCbCr4:2:0). The motion estimation stage, which creates motion vector (MV)data that describe where each of the blocks has moved, is usually themost computation-intensive stage of the algorithm.
Video compression standards specify only the bit-stream syntax andthe decoding process, leaving a large scope for innovation within theencoders. For example, in the motion estimation stage, the ways thatthe motion vectors describe block movement are standardized, but thereare no constraints on what techniques an encoder can use to determinethe vectors.
Rate control is another area with significant latitude forinnovation, allowing the encoder to assign quantization parameters andthus “shape” the noise in the video signal in appropriate ways.
In addition, the advanced H.264/MPEG-4 AVC standard adds flexibilityand functionality by providing multiple options for macro-block size,quarter-pel (pixel) resolution for motion compensation,multiple-reference frames, bi-directional frame prediction (B frames)and adaptive in-loop de-blocking. Thus there is a large potential fortrading off the various encoder options in order to balance complexity,delay and other real-time constraints.
|Figure2. Frame Motion Vectors and Residual. A P frame (right) and itsreference (left). Below the P frame, the residual (black) shows howlittle encoding remains once the motion vectors (blue) have beencalculated.|
Surveillance and storageEncoding must be optimized in order to meet requirements that can varyenormously among applications. For instance, in surveillance,determining how to store the vast amount of visual informationgenerated by networked cameras is one main problem.
One solution is to keep only the frames in which significant orsuspicious activity occurs, such as someone entering or leaving througha secure door. The software that checks for this kind of detail relieson the compression algorithm for motion information. A high magnitudeof the motion vectors indicates significant activity, and the frame isstored. Some, but not all, encoders provide access to motion vectors.
Differential encoding, the ability to encode at two different ratessimultaneously, enhances the functionality of surveillance systems byallowing the system to display video at one rate on a monitor whilestoring it on disk at another rate. Also useful is an encoder systemthat can dynamically trade-off two low-quality channels with a singlehigh-quality channel, enabling the system to select one camera feedover another when significant activity in the frame occurs.
Video conferencing and bandwidth
In video conferencing the most important issue is usually thetransmission bandwidth, which can range from tens of kilobits persecond up to multi-megabits per second. With some links the bit rate isguaranteed, but with the Internet bit rates are highly variable.
Video conferencing encoders may thus have to address the deliveryrequirements of different types of links and adapt in real time tochanging bandwidth availability. The transmitting system, when it isadvised of reception conditions via a reverse channel or RTCP acknowledgement, should beable to adjust its encoded output continually so that the best possiblevideo is delivered with minimal interruption.
When delivery is poor, the encoder may respond by reducing itsaverage bit rate, skipping frames, or changing the group of pictures(GoP), the mix of I and P frames. I frames are not as heavilycompressed as P frames, so a GoP with fewer I frames requires lessbandwidth overall. Since the visual content of a video conference doesnot change frequently, it is usually acceptable to send fewer I framesthan would be needed for entertainment applications.
H.264 uses an adaptive in-loop de-blocking filter that operates onthe block edges to ensure that motion estimation in future frames runssmoothly. The filter significantly improves the subjective quality ofvideo encoded especially at low bit rates.
On the other hand, turning off the filter can increase the amount ofvisual data at a given bit rate, as can changing the motion estimationresolution from quarter-pel to half-pel or more. In some cases, it maybe necessary to sacrifice the higher quality of de-blocking and fineresolution in order to reduce the complexity of encoding.
|Figure3. Intra-Coded Strips in P Frames|
Since packet delivery via the Internet is not guaranteed, videoconferencing often benefits from encoding mechanisms that increaseerror resilience. For instance, progressive strips of P frames can beintra-coded (I strips), as shown in Figure3, above .
This technique eliminates the need for complete I frames (after theinitial frame), reducing the risk that an entire I frame will bedropped and the picture broken up. Also, without the bursts created byI frames, the data flow is steadier.
There is a trade-off in compression, though, since the presence of Istrips reduces the encoder's ability to exploit spatial redundancy.About two to five percent is lost in bit rate, so it is useful if theencoder can switch this capability on or off as needed for coping withnetwork delivery conditions.
Mobile video requirements
In wireless phones with video capabilities, bandwidth is at a premium,even with 3G channels. Processing may also be more limited in thesesystems, since handsets are designed to trade off performance for lowpower consumption.
The encoder at the transmitting end has to take the receivinglimitations into account, at the very least by adjusting the resolutionto the small display. For video streaming, a lower frame rate withfewer I frames or I strips is likely, since the picture degradation isnot as apparent as it would be on a larger display, while the bandwidthsavings are considerable.
Video conferencing is an extreme case, since the handset needs toencode as well as decode. Because the background is usually static andthere is little motion in the foreground, the configuration may be asingle I frame at the beginning of the call followed by P frames only.
Buffering for recordingFor digital video recorders (DVRs), achieving the best trade-off ofstorage with picture quality can be difficult. Compression for videorecording is delay-tolerant to some extent, so the output buffer can bedesigned for handling enough frames to keep a steady flow of data tothe disk.
Under certain conditions, however, the buffer may become congestedbecause the visual information is changing quickly and the algorithm iscreating a large amount of P frame data. In this case, the encoder willtrade off picture quality for a lower bit rate, and when the congestionhas been resolved, increase the quality again.
A mechanism for performing this trade-off effectively is throughrate control, such as changing the quantization parameter (Qp) on thefly. Quantization, as shown in Figure 1, is one of the last steps inthe algorithm for compressing data.
Increased quantization reduces the bit rate output of the algorithmbut creates picture distortion in direct proportion to the square ofQp. Thus the individual frames lose detail, but the likelihood of frameskips and picture break-up is reduced.
Also, since the visual content is changing rapidly, lower quality islikely to be less noticeable than it is when the content changesslowly. When the visual content returns to a lower bit rate and thebuffer clears, Qp can be reset to its normal value.
Sweet spotsAn important consideration for optimization is the existence of “sweetspots” that offer the best trade-offs of bit rates, resolutions andframes per second (fps) for a given application.
For instance, for H.264 compression of DVD content, a typicalencoder's optimal compression for bit rates below 128 kbps is 15-fpsQCIF. Doubling the bit rate allows twice as many frames with the sameresolution. At rates up to 1 Mbps, 30-fps CIF or QVGA may be optimal,and at higher rates 30-fps VGA or D1.
Video conferencing and surveillance, with their more static visualcontent, require less bandwidth for optimization. An encoder mayoptimize compression for these applications at the same resolutions andframe rates listed above but with bit rates some 30 percent lower.
Similarly, MPEG-4 encoding may push the bit rates some 30 percenthigher in order to take advantage of mechanisms for higher quality suchas quarter-pel resolution. A versatile encoder supports a number ofsuch sweet spots in order to optimize video for many applications.
Since DSPs are used in a wide range of video applications, DSP encodersshould be designed to take advantage of the flexibility inherent incompression standards.
An example can be found in the encoders that operate on TexasInstruments' OMAP media processors for mobile applications,TMS320C64x+ DSPs or processors based on DaVinci technology for digitalvideo applications.
In order to maximize compression performance, each of the encodersis designed to leverage the DSP architecture of its platform, includingthe video and imaging coprocessor (VICP) that is designed into some ofthe processors.
Use of the encoders is straightforward, with a uniform set ofapplication programming interfaces (APIs) for all versions. By default,API parameters are preset for high quality and another preset option isavailable for high speed encoding. Extended parameters adapt theapplication to either H.264 or MPEG-4.
The encoders support several options including YUV 4:2:2 and YUV4:2:0 input formats, motion resolution down to a quarter-pel, I frameintervals ranging from every frame to none after the first frame, Qpbit rate control, access to motion vectors, de-blocking filter control,simultaneous encoding of two or more channels, I strips and otheroptions.
The encoders dynamically and unrestrictedly determine the searchrange for motion vectors by default, a technique that improves onfixed-range searches.
System engineers need to be aware of the many differences that existamong entertainment, video conferencing, surveillance, mobile and othervideo applications.
Among these, widely varying requirements for transmission, storage,display and content force trade-offs between bit rate and picturequality, and the different ways of achieving these trade-offs makesystem optimization something of an art form. H.264/MPEG 4 AVC providesa number of options that can affect these trade-offs, and the standarddoes not specify how the encoder should determine data such as motionvectors.
Well designed encoders use this latitude to give systems engineersperformance and flexibility in adapting compression to the requirementsof applications in the rapidly expanding world of digital video.
Dr. Ajit Rao is the manager formultimedia codecs in Texas Instruments. In this role he is responsible fordevelopment and technical direction for TI's multimedia codecsproducts. Prior to joining TI three years ago, he was a lead codecdeveloper at Microsoft and SignalCom. With more than seven years ofcodecs experience, Ajit holds four U.S. patents.