Solving the WLAN VoIP Challenge: Part 2
In Part 1 of this article, we laid out the throughput and delay problems that can occur in a wireless LAN (WLAN) design. Now, in Part 2, we'll further the discussion by looking at the impact of throughput and delay in delivering voice connections over a WLAN link.We'll start the discussion by looking at delay and throughput in a VoIP-enabled WLAN design. As pointed out in Part 1, the discussions that follow will primarily consider the transmission of a 1500-byte IP packet containing a single 1460-byte TCP segment that generates a TCP acknowledgement. Once we understand the analysis technique, we will then apply it to much shorter voice data packets. Also note the results are for a maximum transmission rate with no collisions ideal situation. The performance numbers below are meant to be a bound that represents a best-case scenario.
802.11b Example
As mentioned above, let's take a closer look at the delay and throughput performance an 802.11b WLAN environment will achieve when operating with an 11-Mbit/s transmission rate and a slot time of 20 μs,
In 802.11b, the 20-μs slot time is specified so that it is large enough to allow for receive-to-transmit radio turnaround, MAC processing, and clear-channel assessment (CCA) detect. Slot time is only used when randomizing back off delays for transmission (it is not used for any timing alignment purposes). Please note that actual 802.11 LAN behavior is quite complex because retransmissions (due to contention or noise) can both increase back off delays and reduce transmission rates from 11 to 5.5 Mbit/s to 2 or even to 1 Mbit/s.
To facilitate understanding, we'll use a simple MAC layer activity model. Figure 3 shows an example of the delay analysis technique we are using. When any IP packet is sent over the wireless LAN link it experiences seven typical delays before another IP packet (for example, with TCP data or a TCP ACK) can be transmitted. We are considering a best-case analysis and assume that RTS and CTS are not needed (they will be discussed later).

In the proposed 802.11b example, the following delays are expected:
- The initial delay is the distributed inter-frame space (DIFS) delay (2 x Slot time + SIFS). This interframe space ensures that a previous transmission has completed, and it is safe to access the medium again (this gives time for the previous 802.11 MAC layer acknowledgement to return). This delay is specified as 50μs for 802.11b.
- After the DIFS there is a random back-off wait to access the medium. The window of the number of slot choices to randomly select for a back-off wait is initially seven slots or 140 μs of delay (assuming no retransmissions are occurring). The reason for the back-off wait is to introduce fairness and allow sharing of the wireless medium (it prevents one device from repeatedly transmitting and starving others). In ideal circumstances this results in an average wait of 4 slots or 80 μs. In reality, when contention exists between several wireless LAN devices the window increases and 802.11b throughput could drop substantially, and in turn potentially cause the TCP retransmission rate to increase.
- Transmission begins with a preamble of synchronization bits required for each 802.11b frame. The preamble bits are transmitted at a rate of 1 bit per μs (at 1 Mbit/s) to insure backward compability with older 802.11 devices. The low bit rate allows slower speed devices to remain synchronized even thought they can't understand the higher speed transmissions. The preamble may be either 96 or 192 μs (bits), depending if the short or long preamble is used. Typically devices start out with the long preamble (the usual default) and can negotiate to the short preamble mode if all devices on the wireless LAN agree.
- Next is the data transmission time (or delay). For example, an 802.11b 1536-byte frame will take (1536 x 8 / 11,000,000) approximately 1117 μs for transmission. The transmission time will be higher if slower speed devices are associated with the 802.11b access point. In that case, the 802.11b MAC headers will be transmitted at the slower device speed to maintain synchronization among all the devices on the wireless LAN (the high layer protocol payload is always sent at the higher speed). We will not consider the case of slower devices being present during higher speed transmissions. However, once this analysis technique is understood the reader can easily analyze this case.
- SIFS is a small gap between the data frame and its acknowledgement. This delay is specified as 10 μs for 802.11b. This interval is the shortest interval and gives 802.11 ACKs the highest priority access to the channel. This delay also allows the radio to change modes between transmit and receive.
- Another preamble of synchronization bits either 96 or 192 μs is sent before transmitting the MAC acknowledgement.
- The 802.11b ACK frame (14 bytes for 802.11b which takes a delay of 10.2 μs, but we will round this calculation up to 11μs).
Similar delays occur for the returning TCP ACK segment. The major difference is the payload TCP segment sizes (The TCP ACK can be a lot smaller and often consists of only IP and TCP headers). Because of the short distances involved, propagation delays are extremely small (approximately 1 ns per foot) and can be safely ignored.
To simplify our analysis we'll assume a best-case, high-speed only, no propagation delay and no contention situation with a long 802.11b preamble. For clarity, this analysis ignores the sophistication of the TCP window mechanisms. To avoid constraining throughput, TCP uses both sliding and receiver advertised windows that allow multiple outstanding unacknowledged frames.
In practice, TCP acknowledgements can apply to multiple TCP segments (the IETF standards call for one ACK for every two TCP segments). Our initial analysis will assume that every TCP segment is acknowledged which magnifies the impact of TCP acknowledgements (we'll correct for this later by considering a typical steady state TCP file transfer condition of acknowledging every other packet). In this case the delays experienced for the TCP segment and its associated TCP ACK are shown in the transmission steps in Table 2.

By adding up the appropriate delays, of the transmission steps above, we find that a 1500-byte IP packet containing a TCP segment and its ACK experience a round trip delay, over a 802.11b wireless LAN portion of a network, of 2.242 ms (or 2242μs). This value is the best 802.11b wireless LAN delay expected by our model for individual 1460-byte TCP segments (this does not include the 20 byte TCP and IP headers).
Note actual wireless behavior is very dynamic and in some situations additional substantial delays associated with control functions such as RTS, CTS and association and negotiation activity can be encountered. Also keep in mind that the numbers developed in this paper are simply deigned for best-case bound analysis use and for use as reasonableness checks.
A delay of at least 2.242 ms is part of the overall end-to-end delay experienced by the TCP protocol. Dividing one second by 2.242 ms (for a 1500-byte IP packet) yields a TCP segment rate of approximately 446 segments per second. Each TCP segment has 1460 bytes of payload. Thus, 1460 x 8 x 446 yields a throughput of approximately 5.21 Mbit/s for the 802.11b wireless LAN component of the network path. This is a reasonable 802.11b upper bound throughput limit of TCP end-to-end performance with the long 802.11b preamble. Using a short 96-bit preamble (i.e, substitute 96μs for 192 μs in steps 3, 6, 10, and 13 in Table 1) we get an even better throughput of approximately 6.28 Mbit/s.
This analysis technique can also consider the typical steady-state TCP condition of acknowledging every other packet when long file transfers are involved. Again we will use the long preamble and this can be done by adding together a 2.242 millisecond delay for first packet and TCP ACK with a 1.652-ms delay for the second packet. Therefore, we transmit two TCP segments every 3.894 ms (an average of one every 1.947 ms), which gives us a packet rate of approximately 514 segments per second. This yields a throughput of 6.0 Mbit/s.
In fact, as TCP acknowledges more and more segments with a single ACK, throughput will increase, as more "air time" is obtained for data transfer since less reverse flowing TCP acknowledgements are needed. Of course substantial additional performance can also be obtained by using the short preamble.
Obtaining Flow Bounds
In the analysis above, UDP traffic flow bounds can be obtained by simply replacing the TCP segment with the largest UDP packet payload. UDP headers are only 8 bytes, instead of 20 bytes like TCP headers. That leaves a payload of 1472 bytes for the largest 802.11 frame size.
We have the same frame size of 1536 bytes and the time for each frame carrying a UDP payload will be 1652 μs. UDP does not require acknowledgements. For our best-case assessment, we assume a string of IP packets encapsulating UDP data are arriving at the access point. A recalculation of the delays involved yields at potential throughput rate of 7.12 Mbit/s (1/.001652 x 1472 x 8). The numbers also show that as the payload sizes decrease the throughput drops since 802.11b overhead increases.
VoIP packets are carried in real-time transport protocol (RTP) packets which are carried by the UDP protocol (VoIP doesn't use TCP because it requires too much processing for real time applications). UDP has no control over how long it takes for packets arrive at the destination or the order they get there. Both of these are very important to overall voice quality (how well you can understand what the other person is saying) and conversation quality (how easy it is to carry out a conversation).
The use of 12-byte RTP headers solves these problems by enabling the receiver to put the packets back into the correct order and not wait too long for packets that are either lost or are taking too long to arrive (every single voice packet is not needed, but a continuous ordered flow of many voice packets is what is important).
If you consider 30-byte VoIP packets (along with 76 bytes of headers; 20 bytes of IP, 8 bytes of UDP, 12 bytes of RTP, and 36 bytes of 802.11 MAC) take 612-μs of delays to transfer each frame (the 106 byte or 848 bit voice packet with all the headers itself takes approximately 77 μs to transmit at 11 Mbit/s). The calculations give a VoIP packet rate of 1634 packets per second and yield a throughput of only 392,160 bit/s. This could support approximately six uncompressed 64-kbit/s pulse code modulation (PCM) voice streams under ideal conditions.
Because of the short payload, a substantial increase in performance can be obtained by using a short preamble. Using the short preamble reduces each VoIP packet delay to 420 μs and will give increase the VoIP packet rate to 2381 packets per second and increase throughput to 571,440 bit/s. This can allow eight uncompressed 64 kbit/s PCM voice streams under ideal conditions.
802.11a LAN Example
The same technique used for analyzing 802.11b delay impact can be employed to develop best-case delays and calculate maximum throughput with 802.11a and 802.11g wireless LANs. 802.11a and 802.11g are faster than 802.11b. The reasons for the increased speed are the 54-Mbit/s transmission rate, the timing relationships between frames are tighter, and the transmission encoding used by 802.11a and 802.11g does not require long preambles for synchronization.
For 802.11a, the DIFS is 34 μs (2 x Slot time + SIFS). Slot time is 9 μs and the SIFS is 16 μs. Just like 802.11b, the behavior is quite complex because retransmissions can both increase back off delays and reduce transmission rates far below 54 Mbit/s (possible rates are 54, 48, 36, 24, 18, 12, 9 or 6 Mbit/s).
Similar to 802.11b, after the DIFS there is a random back-off wait to access the wireless medium to ensure fairness. The window of the number of slot choices to randomly select for a back-off wait is the same initial seven slots (assuming no collisions are occurring) for all 802.11 standards. However since slot time is shorter in 802.11a, the average wait of 4 slots results in only 36 μs of delay. In reality, when contention exists between several WLAN devices (or the device and the access point) the back-off window of choices can grow as large as 1023 slots, causing 802.11a to substantially reduce throughput.
Both 802.11a and 802.11g divide data up into a series of symbols for transmission. Both use much larger symbols then 802.11b. Each symbol encodes 216 bits of data. At a rate of 54 Mbit/s, each symbol takes 4 μs to transmit. Thus, 216 bits of data are sent every 4 μs.
The OFDM encoding used by 802.11a and 802.11g also adds six bits to the end of each transmitted frame, so a frame carrying a 1500-byte IP packet is 1,536 bytes long + 6 bits. This gives a total bit string of 12,294 bits that can be encoded in 57 symbols (12294 / 216 = 56.9166) with a total transmission delay of 57 x 4 μs = 228 μs. The 608-bit TCP ACK (40 bytes for TCP and IP headers + 36 Byte 802.11 header = 608 bits) plus 6 bits added to the end of the frame only requires three symbols and is transmitted in 12 μs. The 802.11a ACK also requires just one symbol. Instead of a long preamble, a 20-s header to synchronize the receiver preceeds each frame transmission. The series of data symbols, representing the MAC frame, follows the 20s header.
By adding up the appropriate delays we find that a 1500 byte TCP packet and its ACK experience a round trip delay, over a 802.11a wireless LAN portion of a network, of 500μs (this is 358 μs + 142 mu;s). This value is the best 802.11a LAN delay expected. It is part of the overall end-to-end delays experienced by the TCP protocol.
Note: dividing 1 second by 500 μs yields a 1500-byte IP packet rate of approximately 2000 TCP segments per second. Each TCP segment has 1460 bytes of payload. Thus, 1460 x 8 x 2000 yields a throughput of approximately 23.36 Mbit/s for the 802.11a LAN component of the network path. This is the 802.11a LAN limit of end-to-end performance with TCP.
Again, the analysis technique can also consider the typical steady-state TCP condition of acknowledging every other packet when long file transfers are involved. This can be accomplished by considering two TCP segments and one TCP ACK. Simply add together a 358-μs delay for the first packet and a 500-μs delay for second packet and the acknowledgement.
Therefore, we transmit two TCP segments every 858 μs (an average of one every 429 μs), which gives us a packet rate of approximately 2331 TCP segments per second. This yields a throughput of 27.23 Mbit/s. Just like 802.11b, as TCP acknowledges more and more segments with a single acknowledgement, throughput will increase, as more "air time" is obtained for data transfer since less reverse flowing TCP acknowledgements are needed.
For UDP traffic flows, we have the same frame size of 1536 bytes (with a payload of 1472 bytes) and the time for each frame carrying a UDP payload will be 358 μs. UDP does not require acknowledgements.
For our best-case assessment, we assume a string of IP packets encapsulating UDP data are arriving at the access point. A recalculation of the delays involved yields a rate of 2793 UDP packets per second and a throughput of 32.89 Mbit/s. The numbers also show that as the payload sizes decrease the throughput drops since 802.11a overhead increases.
If you consider VoIP packets with 30 bytes of payload (along with 76 header bytes; IP, UDP, RTP, and 802.11 MAC) take 146 μs of delays to transfer each frame. The calculations give a VoIP packet rate of 6849 and yield a throughput of only 1.64 Mbit/s. This could support approximately 25 uncompressed, 64 kbit/s PCM voice streams in an ideal situation.
802.11b/g Mixed-Mode Design
Whenever an 802.11b station associates with an 802.11g access point, the MAC layer employs the use of RTS and CTS frames as a means to control cross talk in mixed 802.11b and 802.11g environments. This allows 802.11g to pre-reserve the radio medium by using 802.11b-compatible reservation techniques when operating at high rates. In addition the slot time reverts to the 802.11b standard of 20 μs and 802.11b preambles (96 or 192 bits) are used before transmission of any CTS or RTS frames. This extra activity adds substantial delays but is necessary to control the activity of a mixed environment of 802.11b and 802.11g devices.
The purpose of RTS and CTS frames are to tell other WLAN devices that you are going to transmit so they remain quite for the duration of the transmission and the bandwidth could be considered reserved for that transmission. Sending two frames, RTS and CTS, takes more time then sending a single CTS frame. Thus, the minimal reservation activity consists of a single CTS frame that locks out other stations from the medium that can hear the CTS frame (and its duration field). CTS is a 14-byte frame and RTS is a 20-byte frame. Both are transmitted at a rate that all stations can understand, which is 11 Mbit/s. In addition, to keep all the stations synchronized, the 802.11b synchronization preamble bits are sent at 1 Mbit/s prior to either the CTS or RTS.
Fully reserving the wireless medium can only be guaranteed by using a two frame exchange of RTS and CTS which addresses the hidden node problem. The standard allows either CTS by itself or the full RTS/CTS exchange. Depending on the physical layout of the LAN everyone may be able to hear the CTS frame and, in that case, the use of both RTS and CTS is not necessary. The RTS and CTS frames tell all nodes on the wireless LAN to set the NAV counter for the time duration to remain quite. In addition back-off delays are calculated using the 20-μs slot for all devices. This provides fairness by preventing the 802.11g devices from getting preferential access to the wireless medium (over 802.11b devices). The delays with full reservation are in common use today. So we'll focus on the RTS/CTS use situation. Therefore, using both RTS and CTS, the delays for the 54-Mbit/s frames would be:

With full reservation, the same sequence of 15 delays also occurs for transmitting the TCP ACK. The total delay, for both the TCP segment and its ACK, will be 1,492 μs (854 μs for the TCP segment + 638 μs the TCP ACK). The throughput numbers when an ACK is returned for each TCP segment is a segment rate of 670 segments per second yielding a rate of 7.83 Mbit/s.
The throughput numbers for the case when an ACK is returned for every other TCP segment is 853 segments per second, which yields a data rate of 9.96 Mbit/s. The throughput numbers for the UDP flow is a packet rate of 1171 segments per second yielding a throughput of 13.79 Mbit/s.
The throughput numbers for the 30-byte VoIP packets (each packet delay is 642 μs) are a 1,558 VoIP rate, which yields 373,920 bit/s (supports approximately five uncompressed 64-kbit/ PCM voice streams). We can improve VoIP capacity by using the short 96-μs preamble. This save us 192-μs in total delay. In this case, each VoIP packet can be sent in 450μs. This gives us a 2,222 VoIP packet rate, which yields a throughput of 533,280 bit/s (supporting approximately five uncompressed 64-kbit/s PCM voice streams).
Reservation with a Long Preamble
If minimal reservation with the long preamble is used, the delay sequence does not include an RTS frame. This CTS only technique is expected shortly in the marketplace. The benefit is that it eliminates a 192-μs synchronization delay, a 20-byte RTS frame (15-μs delay), and another SIFS frame (10-μs delay). This reduces delay in each direction for the TCP segment and the TCP ACK, which is a total delay savings of 434 μs for both directions. Thus, total delay for the TCP segment and its ACK is reduced to 1058 μs (637μs + 421 μs).
The throughput numbers for the ACK with each TCP segment situation is a segment rate of 945 segments per second yielding a throughput of 11.04 Mbit/s. The throughput numbers for the ACK for every other TCP segment situation is a segment rate of 1180 segments per second yielding a throughput of 13.78 Mbit/s.
The throughput numbers for the UDP flow is a packet rate of 1,570 segments per second, or 18.49 Mbit/s. The throughput numbers for the 30-byte VoIP packets (each packet delay is 425 μs) are a 2,353 VoIP packet rate which yields a throughput of 564,720 bit/s. This translates into approximately eight uncompressed 64-kbit/s PCM voice streams.
Again, we can improve VoIP capacity by using a short 96-μs preamble. In this case, each VoIP packet can be sent in 329 μs. This gives us a 3,040 VoIP packet rate, which yields a throughput of 729,600 bits/s. This throughput allows the mixed-mode 802.11b/g design to support approximately 11 uncompressed 64-kbit/s PCM voice streams.
Wrap Up
There's no doubt that designers of 802.11 equipment must fight tough delay and throughput issues in order to support the delivery or real-time voice and video services. But, despite these challenges, this article has shown that in a best-case scenario, designers can achieve the throughput and delay required to support multiple G.711 channels on various WLAN links. Because real world environments are not ideal, your performance mileage will vary. But even with a 50-percent WLAN utilization (which substantially reduces contention and retransmissions but cuts the throughput numbers in half), quite a few voice channels can be supported, especially with more efficient codecs.
Editor's Note: To view Part 1 of this article, click here.
About the Authors
John Waclawsky is a technical leader in Cisco Systems' Mobile Wireless Group. In the past, he has also served as technical committee chair of the Mobile Wireless Internet Forum (MWIF). John holds a master's degree in Computer and Information Sciences from the University of Pennsylvania as well as master' and Ph.D. degrees in Computer Science from the University of Maryland. He can be reached at jgw@cisco.com.
Jim Gunn is a communication consultant, market researcher, and associate at market research firm Forward Concepts. Jim has a BSEE and MSEE from Oklahoma State and Ph.D. in electrical engineering from Southern Methodist University. He can be reached at jimgunn@ieee.org.


Loading comments... Write a comment