Part 3 details the operation of H.264/AVC and discusses issues involved in transmitting video over networks.
By John W. Woods
There are many new ideas in H.264/AVC that allow it to perform at almost twice the efficiency of the MPEG 2 standard, also known as H.262. There is a new variable blocksize motion estimation, with blocksizes ranging from 16 × 16 down to 4 × 4, and motion vector accuracy raised to one-quarter pixel from the half-pixel accuracy of MPEG 2. The permitted blocksize choices are shown in Figure 11-18. The 16 × 16 macroblock can be split in three ways to get 16 × 8, 8 × 16, or 8 × 8, as shown. If the accuracy of 8 × 8 blocks is not enough, one more round of such splitting finally results in the sub-macroblocks 8 × 4, 4 × 8, or 4 × 4. Note that in addition to what we would get by quadtree splitting (cf. Chapter 10), we get the possible rectangular blocks, which can be thought of as a level inserted between two quadtree square block levels.

Figure 11-18. Allowed motion vector blocksizes in H.264/AVC.
To match this smallest blocksize, a 4×4 integer-based transform is introduced that is DCT-like, and separable using the 1-D four-point transform

The H.264/AVC coder is based on slices, with I, B, and P slices, as well as two new switching slices SP and SI. The slices are in turn made up of 16 × 16 macroblocks. The P slice can have I or P macroblocks. The B slice can have I, B, or P macroblocks. There is no mention of group of pictures, but there are I pictures, needed at the start to initialize this hybrid coder. There is nothing to prohibit a slice from being the size of a whole frame, so that there can effectively be P and B pictures as well.
In H.264/AVC, an I slice is defined as one whose macroblocks are all intracoded. A P slice has macroblocks that can be predictively coded with up to one motion vector per block. A B slice has macroblocks predictively (interpolatively) coded using up to two motion vectors per block. Additionally, there are new switching slices SP and SI that permit efficient jumping from place to place within or across bitstreams (cf., Section 12.3, Chapter 12).
Within an I slice, there is intrapicture prediction, done blockwise based on previously coded blocks. The intra prediction block error is then subject to the 4×4 integer-based transform and then quantization. The intra prediction is adaptive and directional based for 4 × 4 blocks, as indicated in Figure 11-19. A prespecified fixed blending of the available boundary values is tried in each of the eight prediction directions. Also available is a so-called DC option that predicts the whole block as a constant. For 16 × 16 blocks, only four intra prediction modes are possible: vertical, horizontal, DC, and plane, the last one coding the values of a best-fit plane for the macroblock. The motion compensation can use multiple references, so, for example, a block in a P slice can be predicted by one to four reference blocks in earlier frames. The H.264/AVC standard specifies the amount of reference frame storage that must be available at the decoder to store these past pictures, and five past frames is common.

(Click to enlarge)
Figure 11-19. Illustration of directional prediction modes of H.264/AVC in the case of 4 × 4 blocks.
Figure 11-20 illustrates the comparative PSNR versus bitrate performance of the verification models of H.264/AVC on the 15-fps CIF test clip Tempete (HLP, High-Latency Profile; ASP, Advanced Simple Profile; MP, Main Profile). The figure [40] shows considerable improvement over MPEG 2, of about a factor of 2 in creased compression at fixed PSNR. This improvement in compression efficiency is largely due to the greater exploitation of motion information made possible by the revolutionary increases in affordable computational power of the past 10 years. More information on the new H.264/AVC standard is available in the review article by Wiegand et al. [43], which introduces a special issue on this topic [38].

Figure 11-20. PSNR vs. bitrate for 15-fps CIF test clip Tempete. Reprinted with permission from Sullivan and Wiegand. [40]