This article addresses VoIP issues including voice quality, transit delay, echo, comfort noise, and foreign language speakers. It explains the G.711, G.726, G.729A, G.723.1, and G.722 voice compression standards.
By Michael F. Finneran
Order this book today at
www.elsevierdirect.com or by calling 1-800-545-2522 and receive an additional 20% discount. Use promotion code 92398 when ordering. Offer expires 06/01/08. Valid only in North America.
Part 2 takes a closer looks at jitter, delay, and echo, and introduces voice quality measurements.
For an intro to this topic, see How VoIP works: protocols, codecs, and more.
In this chapter we will look at IP telephony issues from the standpoint of voice quality. We will describe the major quality issues that have been identified for packet voice, and the impact these various parameters will have on the user experience. Issues like delay, jitter, and packet loss are inherent in any packet voice system, and we will look at the additional challenge that will be introduced with the use of a wireless LAN as part of the transmissions path.
It is important to recognize that the wireless LAN is only part of the overall transmission path the voice packets will travel between the source and destination. The quality the user will experience is based on the overall performance of all of the elements within the path. Whether we are discussing packet loss or the delay factors, the performance of all of the elements in the path is additive. Whether the receive buffer, the local network, or the wide area network drops the packet, the result is the same: that packet does not arrive at the destination. Similarly, the delay introduced by the WLAN is added to the delay introduced to the wired LAN, the WAN, and the wired LAN at the far end.
In this article we describe the major elements that impact the quality of packet voice. In particular we will focus on the known requirements for delay, jitter, and loss and identify the performance characteristics of the different network services the voice packet may encounter on the way to its destination. Fortunately, we now have a good base of experience gained from both local and wide area packet voice networks so we can say with a high degree of confidence how bad things can get before voice users will start to complain.
Quality Issues in Packet Telephony
The use of packet technology introduces three major quality issues into voice service: voice quality, transit delay, and echo control.
Voice Quality: The basic signal quality a user detects on a voice connection is a product of the voice coding technique that is used and the percentage of packets that fail to arrive at the receiver to be decoded. Packets can be lost two ways in a VoIP network:
- Packet networks can drop packets due to errors or buffer overflows.
- The RTP receive jitter buffer can drop packets if they arrive with a delay greater than the buffer can accommodate. So delivering a packet late is equivalent to not delivering it at all.
The impact of lost packets depends on the technique we use to encode the voice. We will look at loss tolerance and the other voice coding issues in a moment.
Transit Delay: Transit delay is the total delay the voice signal experiences as it travels through the network; this is also referred to as mouth-to-ear delay. A number of factors in the local and wide area network contribute to transit delay. These include voice coding/compression, packet generation, channel contention (in a WLAN), network transport/buffering, and jitter removal. The important thing to know is that once the one-way delay exceeds 150 msec, it will begin to affect the cadence of the conversation. Distance, router buffering, and WLAN contention are all contributing factors in end-to-end transit delay. Transit delay has been one of the major performance complaints we have seen in packet telephony.
The other timing issue is jitter or the variation in delay from packet to packet that is introduced by the dynamic buffering used in a packet network. Left untreated, jitter will render the voice unintelligible. As we noted in the last chapter, the RTP addresses jitter removal, but the downside of the RTP process is that it adds to the overall transit delay.
Echo Control: All telephone circuits introduce echo. However, when the one-way transit delay exceeds 35 to 40 msec, the echo becomes noticeable and annoying. When the delay exceeds that parameter, equipment must be used to remove the echo. Virtually all packet voice networks will exceed 40 msec one-way delay, so echo control will be one element that must be incorporated in the system design.
Voice Quality
The human voice produces a signal that is analog by nature. When you speak, your voice creates a pattern of pressure vibrations in the air, which is essentially an analog signal (i.e., a signal that is continuously varying). Before that analog signal can be transported over a packet network that carries digital information, we need a codec to convert that analog voice signal into a digital representation (Note: Codec is a contraction of the two terms coder and decoder).
There are three primary issues to consider in the selection of a voice coding system.
- The digital transmission rate required.
- The delay introduced by the coding process.
- The loss tolerance, or the percentage of packets that can be lost before the voice quality degrades below the allowable quality threshold.
Any technique that converts voice into a digital representation introduces some degradation in the sound quality; in general, those signal degradations are indistinguishable to the human ear. However, if some of those bits are changed due to transmission errors, or if some of the bits are lost due to packet dropping, the quality of the recovered signal can be impacted. There are a number of voice coding techniques, and they vary with regard to efficiency, robustness, and encoding delay. The parameters for the major voice coding options are summarized in Table 8-1.

Table 8-1. Voice Coding Options.
An efficient coding system reduces the number of bits we need for each channel and thereby increases the number of simultaneous voice calls we can support on a given amount of network capacity. As a general rule, the more efficient the voice coding system, the longer the encoding delay and the greater the impact of packet loss.