Reducing VoIP quality degradation when network conditions are unstable
In jitter buffer literature, the algorithms focus primarily on optimizing playout, but ignore some important details like the target platform (general purpose computers, embedded systems), restrictions (the amount of available memory, the amount of processing power available for the jitter buffer), network type (e.g. a lighter algorithm may be used when the network conditions are well known) that may have a great influence over the algorithm selection.
We have focused our study in finding an algorithm that minimizes both the amount of memory and processing power used and targets a VoIP Media Gateway running on a DSP. At the same time, the algorithm has to offer quality services for “well behaving networks” (i.e. well-managed IP networks and partially-managed IP networks as defined in ITU-T G.1050) and graceful degradation for unmanaged IP networks.
The previously studied algorithms perform delay adjustment for each packet or per talkspurt basis. The talkspurt based algorithms compress/expand the silence periods between talkspursts to avoid packet dropping and gaps insertion. For voice application, such an algorithm would be the preferred solution because it eliminates the unpleasant effect of packet dropping or gap insertion into the middle of the talkspurt.
Unfortunately, it is not always easy to detect the start of talkspurt - some streams carry this information in the marker bit from the RTP header, others do not. This information may be computed outside the jitter buffer as a packet type differentiation (silence or voice), but not all the voice application may obtain this information easily prior to packet insertion into the jitter buffer.
The proposed algorithm has two flavors:
- An algorithm for streams with talkspurt information (we will refer to this algorithm as TAA – Talkspurt Adaption Algorithm)
- An algorithm for streams without talkspurt information (we will refer to this algorithm as NTAA – Non Talkspurt Adaptation Algorithm).
The jitter buffer uses one of the abovementioned algorithms depending on the availability of talkspurt information. If the jitter buffer can identify the start of talkspurt, it uses the TAA algorithm; otherwise it uses the NTAA algorithm, the switch between the two being automatic.
The TAA algorithm
The algorithm combines a proactive approach with a reactive one. The proactive part is continuously estimating the average delay and its variance using the algorithm proposed by Ramjee et al. in [3]. We have selected this algorithm because of its simplicity and low requirements in both memory and processing power, but other algorithms may be also used.
The jitter buffer is configured to perform optimum for certain network conditions in terms of delay variation, packet reordering and packet loss. The reactive part is activated when the jitter buffer reaches certain situations like overflow or late packet arrival.
Thus, if the network conditions are stable and within the supported limit the estimator will be, in general, accurate. If there are non-conformities to these conditions, the jitter buffer will enter in one of the two states: late packet arrival or overflow.
Let Sn be the send time for packet n and Slast the send time for the last packet played. A late packet having Sn < Slast is dropped because a packet with newer information has been already played. However, if late packet has Sn > Slast, it can still be played with the cost of delay increase. If the packet is within a reasonable range relative to the jitter buffer current time (e.g. 1 second), it is accepted and the delay increased.
At this point, two options are available:
- Drop the packet and keep the current delay. If this packet is an isolated one, the solution is good.
- Keep the packet and increase the delay. If, the delay is significantly increased, the buffer will start filling out and it may encounter an overflow situation. In this case the overflow situation has to be carefully handled.
The TAA algorithm uses the second approach: keeps the packet and handles overflow situations when they are encountered.
Let:

Let αn be the delay increase for accepting late packet n; therefore the delay increase for a received packet is F(n)* αn, .
Let TDIL be the “Total Delay Increase” due to accepting late packets
So, for N received packets TDIL is:
EQ4
In an overflow situation, a packet must be dropped because there is not enough memory to accommodate the new one. We have chosen not to drop the current packet, but to drop a packet that would allow a rapid delay decrease to avoid entering the overflow situation again at the next incoming packet.
Let bn be the delay decrease for solving the overflow n.
Let us assume that for a period T, N packets were received and M overflows occurred. In this situation TDIL is:
EQ5
Shown below is the pseudo-code for late packet arrival and overflow situations. In case of overflow, if TDIL is 0 (there was not a delay increase because of late packet arrival in the near past) the choice of the packet to be dropped depends on the received packet location inside the buffer.
If the delay variation is within the acceptable ranges, this approach would position the jitter buffer time better on the delay variation distribution; if the delay is larger than the supported ranges, packets have to be dropped anyway.
The TDIL value reflects a delay increase due to acceptance of late frames. The value is maintained to indicate a delay decrease necessity when overflow occurs. However, if the late packets acceptance happened somewhere in the past, it should not have the same influence on the current overflows, thus an aging mechanism for TDIL is employed (we have selected to decrease the TDIL by a factor after each N packets received without overflow).
Also, an important aspect is the type of the late packet. If the late packet is silence, a good decision would be to drop the packet (it does not contain relevant information anyway).
The NTAA algorithm
The algorithm uses the same reactive part as the TAA does. However the proactive algorithm cannot be used because the stream has no talkspurt information. The delay increases naturally by inserting gaps into the stream when no packet is available for playing. However, there are some situations when packets are delayed for a period of time (e.g. when congestion occurs).
During this period, the jitter buffer has no packet for playing, but eventually the delayed packets are received (almost) all at once causing the jitter buffer to fill up. If, after this congestion period, the network conditions become stable again, the jitter buffer will remain with a considerable number of packets, so all the new received packets will be delayed although the delay on the network is low. In these situations, it would be better to reduce the delay even by dropping some packets.
Let [T1, Tn] be the analysis time interval (the interval the jitter buffer is run on). The algorithm splits the interval into equally sized intervals (let Ts be this size), let M be the set of those intervals, M={I1, I2,…,In/Ts}. Let Min(Ik) be the minimum value of the buffering delay for interval Ik.

The algorithm computes the minimum buffering delay for each interval. Min(Ik) being over a certain threshold (MIN_THRESHOLD) is an indication that the packets were kept in the buffer longer than necessary because there are no “late packets” received (packets with Min(Ik) under the threshold) inside Ik. In this situation the algorithm makes the decision to decrease delay by one frame.
Figure 1 : Packet buffering delay
An example is depicted in Figure 1 above, where, after a period, the delay is decreased. The buffering delay may be considerable decreased for the second half, a buffer of 200 samples being more than enough.



Loading comments... Write a comment