The need for speed in low latency video system designs

Krishna Prabhakaran, eInfochips

June 16, 2014

Krishna Prabhakaran, eInfochipsJune 16, 2014

Lag-time in video games and video conferencing is annoying. Lag-time in avionics, medical devices, and industrial video systems is mission-critical. That’s why low latency video systems are proliferating in applications where live video feeds need to be processed and analyzed in real time. This article discusses some of the various contributors to latency in video systems, and ways of minimizing their impact at the video source and playback ends.

The need for speed
For consumers and business users, lag time is commonly experienced in video games and video conferencing. Lag time in video games leads to being overrun by enemies, eliminated by other players, or in the case of massive Star Craft games, it leads to a stadium full of angry fans. In real-time business systems, low-latency video conferencing is also important. Without it, mismatched voice and video cause confusion and frustration. High latency in a video conferencing system can disrupt a conversation to the extent that it defeats the purpose of using video conferencing to increase productivity. In mission critical applications, the severity of high latency consequences is multiplied, such as in Unmanned Aerial Vehicles (UAVs), video assisted surgery, and mid-air refueling.

UAVs used in tactical strikes on enemy targets and video-assisted surgeries such as endoscopy and laparoscopy appear to be unrelated, yet both rely on accurate information - an accuracy that can only be provided by low latency video systems. If the latency is too great, the consequences can be catastrophic, such hitting the wrong target with the UAV's payload or missing a crucial element during surgery.

During mid-air refueling, military pilots are trying to orient the aircraft’s fuel inlet to the fuel pipe of a fuel carrier aircraft. Again, video systems play a critical role here, capturing and sending live video feeds of the inlet to the pilot in real time. In this instance, the success of the entire operation is highly dependent upon the latency between the captured video at its source and the video displayed on the screen in the cockpit.

When making mission-critical decisions based on video feeds, low latency is essential. In my examples, higher latency could lead to the UAV targeting an unintended area, doctors making misplaced incisions, and military aircraft running out of fuel. This could ultimately lead to serious property damage, failed missions, and loss of life.

Latency contributors and counter measures
The anatomy of a typical video system consists of two parts: the video source end where video is captured, compressed, and streamed, and the video playback end, where video is received, decompressed, and displayed.

The diagram in Figure 1 shows the components of a typical video system. Each of the modules below adds a delay to the end-to-end latency of the system. Let us investigate where and how these delays are introduced:

Consider a basic 60 frames/sec video. Capture and display of video adds 17 ms to each. Depending on availability of encoder and decoder (CODECs), your compression and decompression modules add another 15ms to 17ms each. Encapsulation and decapsulation modules, based on their container formats, add between 15ms to 17ms each. The transport protocol (RTP, UDP, TCP) adds 5ms to 10ms each in the streamer and stream receiver modules.

Figure 1: The components of a typical video system

When you put these numbers together, you arrive with a delay of somewhere around 94 ms to 121 ms. Note that additional delay is introduced for data to arrive at the stream receiver after it’s sent from streamer. This delay is a further contributor to the end-to-end latency of the system.

Use of innovative techniques for minimizing these delays may be possible though not always practical. For example, instead of allowing complete frame capture to add 17ms of delay, we can process videos by capturing slices of input and encoding them. This way, depending on the number of slices, video capture delays can be reduced. However, this requires tampering with the video capture interface, which is not easy to do, and can affect system stability.

Similarly, the use of electronic/hardware resources to minimize processing delays can reduce latency. For example, deploying components capable of 1080p60 video processing and using them for 720p60 video processing will reduce the processing latency to approximately 9ms, from 17ms. While successful this is an inefficient solution with underutilized capacity. Such workarounds only add to the cost of the system.

However, there are other methods for reducing the inter-module delays, and thus reducing the overall end-to-end latency of video systems. This paper talks about various methods used to control latency, encountered issues, and solutions that worked for correcting these issues.

Approaches to lowering video latency delay
One of the common issues encountered in low latency video systems is the jerky playback (jitter effect) of video feeds. A jitter buffer is required at the video playback end in these cases to smooth out video playback. Jitter buffers store some encapsulated video data, so that the video playback end has some data for processing, and does not starve, leading to that jittery effect. But, jitter buffers add their own latency, depending on their size, thereby adding to overall latency of the video system. Avoiding jitter buffers can help reduce latency. However, this can only be achieved by controlling video encoding options at the video source end. Some of options are discussed below.

Avoiding encoding mode frame size variations. There are two encoding modes: variable bitrate (VBR) mode, where a quantization parameter value is fixed; and a constant bitrate (CBR) mode, where the quantization parameter value varies.

At the video source end, when the encoder is configured to use a VBR mode, it gives quality video but at the same time leads to large variations in the video frame sizes. The variations in the video frame sizes lead to jerkiness in video playback at video playback end. For smoother playback of video while using these settings, the introduction of a jitter buffer is required, adding to the latency of the video system. Thus it is strongly advised to avoid using VBR mode, so that the video playback end can cut down on the use of a jitter buffer. Instead of VBR, the encoder should be configured to use CBR mode.

Avoiding coding type frame size variations. Typically, there are two types of commonly used coding: Intra-coding, where compression is achieved by removing redundancy within the same video frame, and Inter-coding, where compression is achieved by removing redundancy in the subsequent video frames.

Depending on the coding techniques used, there are intra-coded frames (aka I frames) and inter-coded frames (aka P frames). I frames result in more bits as only intra-coding is used. P frames result in fewer bits, as both intra and inter coding are used. The fluctuations in the number of bytes in I frames and P frames adds to fluctuations in reception times for these frames at the video playback end, requiring a jitter buffer to smooth out output at the video display end. When these fluctuations are high, the latency increases.

The diagrams in Figure 2 shows an analysis featuring the max and average variations in I and P frames for two streams, one with bad rate control (left) and one with good control (right). The badly controlled stream illustrated to the left has wide fluctuations between the average and maximum variations of the data in the I and P frames, respectively as well as fluctuations between the two types of frames, severely impacting the latency of the video system. The stream with the good rate control to the right shows streams that are much smaller in size and with less fluctuation between average and maximum values and less difference between the I and P frames.

Click on image to enlarge.

Figure 2: An analysis of maximum and average variations in I and P frames

Apart from I and P Frames, there is a third type known as a B frame, which uses bidirectional prediction in inter-coding. The bidirectional prediction requires multiple frames to be decoded prior to decoding of B frames. This impacts the video source as well as video playback end latencies. It is strongly advised to disable this feature while doing video compression.

< Previous
Page 1 of 2
Next >

Loading comments...