Facing the challenges of VoIP WLAN (VoWLAN) design: Part 1

Praphul Chandra and David Lite

February 21, 2007

Praphul Chandra and David LiteFebruary 21, 2007

The 802.11 standard has come to be somewhat of an "umbrella protocol" due to the numerous enhancements (e, g, h, k, etc.) that have been ratified (or are being discussed) to improve the base standard, which was ratified in 1997.

Given this situation, it is very difficult to determine what kind of 802.11 deployment is being discussed when someone uses the generic term of WLAN or VoWLAN. (LAN here means strictly LAN.  VLANs are not included in this definition. In fact, VLANs are one of the primary users of Layer-2 quality of service (QoS) mechanisms. )

5.2 VoWLAN
Voice over IP (VoIP) comes in many flavors. At a high level, we can distinguish between these flavors by considering where in the overall architecture voice transitions from the PSTN to the IP network. At one extreme is the traditional PSTN model, where one black phone calls another black one and the voice call is established entirely using the PSTN. There is no IP and hence no VoIP in this scenario.

At the other end of the extreme is the end-to-end VoIP model where an IP "phone" (which may be a soft-phone application running on a PC or an actual physical entity) calls another IP phone without the voice call ever transitioning to the PSTN. Somewhere in the middle of these two extremes lies the concept of gateways, which connect the IP world to the PSTN.

Note how the discussion of VoIP is restricted to the wired domain. Sure, it is possible to install a media gateway that carries calls from/to wireless cellular subscribers over an IP network, and it is also possible to use cordless phone technology in VoIP architectures, but the VoIP end-device itself is restricted to being a wired device.

In other words, IP phones (soft or hard) are always wired devices. This limited deployment of VoIP in "mobile" scenarios. This is where VoWLAN comes in. The wide-scale deployment of 802.11 networks means that it is now possible to implement a VoIP solution over a WLAN instead of a wired LAN like Ethernet and VoIP can, for the first time become a wireless solution.

At the face of it, the solution of running VoIP implementations over 802.11 instead of 802.3 (Ethernet) seems like a simple proposition. After all, one of the primary design criterion for the OSI-layered architecture was to minimize the interdependence between layers. Arguably, since VoIP is implemented at Layer 3 and above, a change in the layer-2 protocol should be trivial.

However, this is far from the case. At the outset, let us realize that the 802.11 standard was designed primarily for data communication. However, voice communication is inherently very different from data communication. Unlike data, voice traffic is characterized by small packets transmitted periodically and symmetrically in both directions.

Voice traffic also has its own constraints in that it is extremely sensitive to delay and jitter. Furthermore, the quality of a voice call is also dependent on the packet-loss characteristics: while small losses can be tolerated, large gaps (bursty packet losses) will cause serious degradation in voice quality. To use 802.11 for voice communication therefore poses some major challenges..

5.3 System Capacity and QoS
This section deals with system capacity and quality of service (QoS) issues in VoWLAN. We start with system capacity. Defining system capacity is a tricky issue. It is often described in terms of channel bandwidth (Mbps). However, this can be a misleading parameter.

Here, we just want to emphasize that, for VoWLAN systems, the simplest and most useful definition of system capacity would simply be the number of simultaneous voice calls that can exist in a BSS (basic service set). This is the definition we use.

Even though it may not be clear at first why the topics of system capacity and QoS are clubbed together, a little analysis will reveal that these two topics are inherently linked together. In VoIP, the term QoS is usually used to refer to the real-time requirements (low delay, low jitter and loss-characteristics, etc.) of voice, video and so forth. The basic approach to achieving QoS is to "mark" real-time packets so they get prioritized access to network resources like bandwidth.

This may or may not involve reserving resources in the network for real-time traffic. However, the basic philosophy is that, since network resources are limited, real-time traffic should have prioritized access to it. Note that if there are "enough" network resources available for all traffic, there is no need to prioritize real-time traffic.

Hence, the concept of system capacity and QoS are inherently linked. If we have enough system capacity, there is no need for QoS mechanisms. This is, for example, the case when making VoIP calls within a LAN1 that uses 100 Mbps or Gigabit Ethernet. This is one of the reasons why QoS has traditionally been a Layer-3 (or above) issue in VoIP.

Another reason for treating QoS in the higher layers is because most VoIP implementations simply treat the IP network as a "cloud" without any information about the underlying link layer, since the VoIP endpoints do not know about what happens (for example, what Layer 2 technology is used) in the cloud.

This is not to say that VoIP implementations never use Layer-2 QoS. There are scenarios where VoIP endpoints are aware of the Layer-2 technology being used and Layer-2 bandwidth is at a premium.

In such scenarios (VoCable, for example) Layer-2 QoS (DOCSIS in VoCable) has been used in VoIP deployments. Since in VoIP over WLAN we also know the characteristic of the underlying link layer, QoS becomes relevant at Layer 2. The following subsections discuss why system capacity and QoS are important issues in VoWLAN.

5.3.1 Packet Sizes.
Given that the bandwidth requirement of a VoIP stream can be minimized to about 10 kbps (e.g., through the use of a high-compression codec such as those discussed in Chapter 3), an 802.11b WLAN could, in principle, support hundreds of VoIP sessions. In reality, no more than a handful of sessions can be supported by an 802.11b WLAN due to various overheads.

Figure 5.1 Throughput Vs Packet Size

As Figure 5.1 above shows, the effective throughput in an 802.11b network has a large dependency on the payload size that is used. Even though this is not an issue for data applications (since they will most likely use large payload sizes), this does not bode well for VoWLAN where the packet size needs to be kept short to minimize end-to-end delay.

VoIP (and hence VoWLAN) uses packetization periods of the order of 10"40 ms leading to payload sizes of the order of 100"300 bytes. As is clear from Figure 5.1, this limits the effective bandwidth available for VoWLAN to about 1"2 Mbps in a BSS.

Realize also from Figure 5.1 that using higher transmission rates helps improve system capacity, but this increase in system capacity is most significant at higher payload sizes and the gain at lower payloads is comparatively small. Applications like VoWLAN, which use small payloads, see only a small increase in system bandwidth since they lie in the bottom left corner of the graph.

To understand why the system capacity is a factor of the payload size, realize that the 802.11 MAC does not take into account the transmission time (for which the station would use the media once it captures it) when competing for media access. Instead, it concentrates only on making the number of transmission opportunities fair among stations.

In other words, the MAC protocol ensures that, once a station gets access to the media and finishes its transmission, it must again compete with other stations to transmit its next packet.

However, the MAC does not take into account how long the station would stay on the media once it gets access to it. So, once a station gets access to the channel, it may transmit a packet with a payload of 10 bytes or a payload of 2300 bytes; this difference is not taken into account when stations compete for access to the media.

In effect, a station that transmits 2300-byte payloads on getting access to the media can pump much more data through the network than a station that transmits only 10 bytes of payload when it finally gains access to the media. Therefore, in VoWLAN systems, stations that use small payload sizes must spend a considerable amount of time backing off in the MAC protocol to avoid collisions, and this leads to limited system capacity.

5.3.2 Packetization Overheads
Another reason for the limited capacity of VoWLAN is the packet header overhead that is added as the short VoIP packets traverse the various layers of the standard protocol stack. The payload of a voice packet with a 10-ms packetization period, as generated by the voice codec, ranges from 10 to 80 bytes, depending on the codec used. This voice payload then passes down the stack via the RTP, UDP and IP layers.

These three layers add headers of a total size of 40 bytes. Next, the IP layer hands over this packet to the 802.11 MAC protocol, which adds a header of 34 bytes. Note that, at this stage, the total packet size (assuming a 30-byte voice payload) is 104 bytes, out of which only 30 bytes is the actual voice payload. That is an efficiency of less than 30%.

Next, when the packet is handed over to the 802.11b PHY layer, a PLCP header and a PLCP preamble are added to it. Even though the size of these together is 15 bytes (short preamble) or 24 bytes (long preamble), the PHY overhead is significantly large since the transmission rate is limited to 1 or 2 Mbps.

Figure 5.2. PHY headers for 802.11b

Assuming that the rest of the packet gets transmitted at the maximum 802.11b rate of 11 Mbps, this means that it takes 96 microseconds (short preamble) or 192 microseconds (long preamble) just to transmit the PHY layer overheads. From a VoWLAN perspective, this means that to transmit 22 microseconds (30 bytes) of voice payload, it takes a total time of 172 µs (short preamble) or 268 microseconds (long preamble). That is an efficiency of about 9 to 13%, which means we are already down from 500 VoWLAN sessions to about 50 VoWLAN sessions in a BSS.

5.3.3 DCF Overheads
In order to protect against nodes hogging the channel once they get access to it, the 802.11 MAC data count field (DCF) requires that a station must wait between consecutive packet transmissions.

This waiting period allows other stations to compete for channel access if needed and thus ensures that a station does not hog the channel once it gets access to it. However, this waiting period also means additional overheads. Let us calculate the time it takes to transmit a voice packet using DCF (Figure 5.3 below).

Figure 5.3 DCF Timing

From Figure 5.2 earlier, the total time it takes to transmit a voice packet can be calculated as:

Pkt_TxTime = DIFS + BO + PHY_TxTime + MAC_TxTime + Payload_TxTime + SIFS + ACK_TxTime.

For 802.11b, even in the best-case scenario (short preamble, maximum transmission rate, an aggressive BO time and a 30-byte payload packet), we have:

DIFS = 50 µs
BO = Slot Time * CWavg = 20 * 31/2 = 310 µs assuming CWavg = (CWmin "1)/2
PHY_TxTime = 96 µs assuming short preamble
MAC_TxTime = 34 * 8/11=25 µs assuming the maximum
Payload_TxTime = 70 * 8/11=51 uSec where "payload" includes RTP, UDP and IP headers
SIFS = 10 µs
ACK_TxTime = PHY_TxTime + (34 * 8/11) + (14*8/11) = 131 µs since the 14-byte ACK also comes with 802.11 MAC and PHY headers.
Therefore Pkt_TxTime = 673 µs

Continuing our calculations of the number of simultaneous voice calls in a BSS from section 5.3.2, from a VoWLAN perspective this means that to transmit 22 µs (30 bytes) of voice payload, it takes a total time of 673 µs, which reduces the efficiency to 3%.

Note the effect of ACKing each packet. The 802.11 MAC requires each data packet to be explicitly acknowledged (ACKed) to cope with operating in the (hostile) wireless environment. ACKing each packet also reduces the system capacity significantly.

5.3.4 Transmission Rate
Note that section 5.2.2 is a best-case estimate given the assumptions we made. For example, we assumed the transmission rate to be 11 Mbps but we know that the transmission rate used is often a factor of channel conditions (which is also a factor of distance between the communicating stations).

Transmitting at lower data rates would means that the transmission time for each packet increases and the system is occupied for more time, thus reducing system capacity even further for VoWLAN.

Figure 5.4. System capacity versus range

Figure 5.4 above uses a very simple radio channel model to illustrate the effect of transmission rate on system capacity in VoWLAN. We know that the strength of the signal decreases as the distance between the communication stations increases.

In Figure 5.3 earlier , with the AP at the center of the figure, assuming a constant noise floor, the received signal strength (and hence the SNR) decreases as we move away from the AP. Therefore, the "optimum" transmission rate decreases as we move away from the AP.

With a decrease in the transmission rate, the number of voice calls that can be supported also decreases as we move away from the AP. Note that Figure 5.3 is not drawn to scale and is for illustration purposes only.

Transmitting at higher data rates means that the transmission time for each packet decreases and the medium is freed up for use for more packets, thus increasing system capacity.

However, transmitting at higher data rates means using more complex modulation schemes, which are more susceptible to channel noise. Therefore, using higher data rates in adverse channel conditions (high channel noise -i.e., lower SNR) can actually lead to higher BER - i.e., higher packet loss, which may require more retransmissions and thus effectively reduce system capacity.

The goal, therefore, is to dynamically adjust the transmission rate. The concept of rate adaptation is to select the appropriate transmission rate based on channel conditions and performance. The rate-adaptation algorithm is not specified in the 802.11 standards and this is expected to be one of the product differentiators among various vendors.

It is important to realize that a rate-adaptation algorithm optimized for data may not yield the best results for real-time applications like voice. The primary reason for this is that voice is extremely sensitive to delay and jitter. We shall see how this affects the rate-adaptation algorithms.

In order to decide which rate is optimal at any specific moment, the rate-adaptation algorithm needs information about the current link conditions. Since it is difficult to get this information directly, most algorithms use some form of statistics-based feedback.

The statistic most often used in such feedback schemes is the user-level throughput. This means that these algorithms aim to maximize the application-layer throughput.

To achieve this, typical 802.11 rate-adaptation algorithms are "aggressive" in attempting to switch to a higher PHY data rate, the underlying theory being that if packet error rate increases at higher data rates, the 802.11 algorithm will cope with such drop-outs by using frame retransmissions.

This approach works fine for data communication since the extra (and variable) delay introduced by this retransmission-dependent approach is acceptable to data applications. However, this increase in (average) packet delay and jitter (due to variations in the number of retransmissions) can cause serious degradation for voice communication.

Consequently, rate-adaptation algorithms optimized for data communication often perform poorly for VoWLAN (Here, VoWLAN refers to WLAN using the infrastructure BSS.).

Realize also that there will be times in an 802.11 network where temporary network conditions will prevent the successful transmission of a packet under any rate adaptation. For example, the STA may have moved into an RF blind spot, or near a jamming device such as a microwave oven.

In these cases, a voice-friendly rate-adaptation algorithm would want to give up on the current voice packet instead of delaying the entire voice packet stream trying to get the current voice packet through.

Since VoIP packet-loss concealment algorithms can hide the loss of one or two packets but cannot mask a large drop-out, there is no point in taking pains to deliver a voice packet if it is so late that the receiver jitter buffer has underflowed. Again, the rate-adaptation algorithms are expected to be another product differentiator among vendors.

Next in Part 2: Inherent Fairness Among All Nodes

Used with the permission of the publisher, Newnes/Elsevier, this two part series is based on material from "Wi-Fi Telephony: Challenges and Solutions for Voice over WLANs," by Praphul Chandra and David Lide.

Praphul Chandra is currenty senior research scientist at HP Labs, India, which focuses on "technological innovation for emergining countries." David Lide is currently a senior member of the Technical Staff at Texas Instruments, Inc., and has worked on various aspects of Voice Over IP for the past eight years.

Loading comments...