Speed packet throughput with compressed UDP
A typical User Datagram Protocol/Internet Protocol (UDP/IP) header contains 20 bytes of IP header and eight bytes of UDP header. While UDP headers contain source and destination UDP port addresses, length and checksum, an IP header holds 12 different fields including source and destination IP addresses and Type of Service (ToS) bits.
On most packets that belong to the same packet stream, values of most header fields tend to remain constant. The compression of UDP/IP header is based on the fact that the fields do not change between packets and do not need to be transmitted in their entirety in each packet. As a result, the transmitting end point can replace the standard header that has 16 fields consuming 28 bytes with a simple header with just three fields requiring as few as 5 bytes.
This compressed header contains Context IDentification number (CID), which is a 2 byte field uniquely identifying the flow; generation (1 byte); and IP identification (2 byte) fields that form a 5-byte compressed header. The receiving end point can decompress the header by replacing the compressed header with original header fields.
The IETF RFC 2507 entitled IP Header Compression specifies details for compressing header fields to achieve the following goals:
- Improve interactive response time
- Allow using small packets for bulk data with good line efficiency
- Allow using small packets for delay sensitive, low-data-rate traffic
- Decrease header overhead
- Reduce packet loss rate over lossy links
This article describes results, based on completed implementation, about how this header compression is implemented in a network processor (NP). The chip is used in multiservice access network applications.
Network processors with an on-chip control processor serve as the host. Alternately, the NPs can be controlled by an external host processor such as the IBM PowerPC 750F. When packets are received on any of the device's ports, the classifier block in the device determines which packet flow the received packet belongs to. Once the flow is determined, the traffic manager engine can access the state variables associated with this flow and make routing, queuing, scheduling and PDU modification decisions. The architecture lends itself well for UDP/IP header compression and decompression.
When packets are received, the classifier hands over the packet to the traffic manager block indicating to which flow the PDU belongs. The traffic manager block then performs the compression and decompression based on information received from the classifier.
Modifications to the PDU header are carried out by the stream editor, which is part of the traffic manager (See Figure 1).
Compression and decompression are executed in the fast path without assistance from the host. Only in the beginning, when a packet with a new set of header parameters arrive (for which flow has not been established), the host assists in setting up the table entries required for the NP to perform header compression and decompression. Once the entries are made, the NP as part of the fast path processing performs the actual compression and decompression.
UDP header compression
Mapping of UDP/IP packet headers to CID is performed in the classifier using a lookup tree. If this is a new-flow (i.e. UDP/IP packet header lookup failed), the packet is sent out as is. In addition, a copy of the packet is sent to the control processor for flow setup.
The host processor then creates an entry in the lookup table. Fast path uses the table for this set of header parameters and the compressed header is assigned to this combination of header parameters. Subsequent packets received with this header parameter combination are matched by the lookup table entry. The parameters are replaced by the compressed header result in header compression. Re-writing of the packet to compressed format or full-header format is performed in the stream editor..
Initially, the network processor performing the compression is required to transmit a PDU with a full header, which includes all the uncompressed header parameters as well as the CID assigned to this specific flow. This PDU helps the network processor at the receiving end record the decompression parameters specific to the given CID.
When subsequent PDUs that belong to this same flow are delivered with a compressed header, the network processor at the receiving end can replace the compressed header with the correct set of header parameters as part of the decompression process.
Full headers are sent at specified intervals with a decreasing frequency rate under a process called refresh check (See Figure 2). These headers are sent to ensure compression/decompression synchronization. The state engine determines whether it’s time to send a full header PDU.
The control processor performs the following tasks to aid compression:
- Updates the lookup trees to setup flow identification
- Sets up the state engine parameters to trigger packets with full header
UDP header decompression
As previously noted, decompression of the header is also performed in the fast path without assistance from the host (See Figure 3). However, when new headers with unknown header parameters are encountered, the host creates the required lookup tree entries to assist the device with the decompression process.
Using a lookup tree, the classifier checks whether flow exists. If this is a new-flow (i.e. CID lookup failed), the packet is sent to the control processor. The control processor updates the lookup tables and the SED parameter memory.
The packets are then returned for re-insertion to the same packet stream. Implementation ensures that successive PDUs that belong to the same flow received at the host do not result in multiple CIDs being assigned to the same flow. This implementation also ensures all the received PDUs are properly re-inserted back into the packet stream, ensuring correct first in/first out ordering of the packets.
Whenever full-header packets are received, the headers are checked using a Functional Programming Language (FPL) tree table. These are checked to make sure no field assumed to be constant is changed. The SED re-writes the packet to uncompressed format.
The control processor performs the following to aid de-compression:
- Updates the lookup trees.
- Updates the SED parameter memory. Uncompressed header is held in SED parameters.
- Holds the packets until the entries are completed.
- Re-inserts the packets back into the stream once the new de-compression entry is made; the required decompression on the headers can be achieved.
The implementation of UDP/IP compression involves approximately 1,000 lines of fast path code and 1,200 lines of slow path (host) code. This code is the means to configure the classifier and traffic management block previously referenced.
The processor could maintain 2-Gbps throughput, providing both compression and decompression services simultaneously. A sample implementation demonstrated that the device could easily handle 10,000 unique flows. These flows were created by varying at least one parameter out of the 5 tuple (source and destination IP addresses, source and destination UDP ports and the ToS field) used to identify individual flows.
The uncompressed header carries 28 bytes of UDP/IP header information. This information is compressed into five bytes when the compressed header replaces the static parameters. Thus, this compression procedure results in more than an 80 percent reduction in header size. The actual reduction in total throughput will depend on the size of the payload with respect to the header (smaller payloads will result in bigger reductions in bandwidth because the headers occupy relatively more space).
When twenty-byte payload size is used, 939 Mbps of uncompressed traffic sent is compressed to 435 Mbps and vice versa, indicating the extent to which throughput savings can be realized.
UDP/IP header compression can serve as a powerful technique to improve throughput and reduce packet loss and delay by reducing header size as much as 80 percent. When the payload size is small, by reducing header overhead, this implementation doubles the throughput capacity, substantially improving transmission efficiency.
About the Authors
Sundar Vedantham is a senior systems engineer at the Agere Systems Telecom & Enterprise Networking Division. Sundar can be reached at: firstname.lastname@example.org
Pravin Pathak is a senior systems engineer at the Agere Systems Telecom & Enterprise Networking Division. Pravin can be reached at: email@example.com
Lauren Yang is a Member of Technical Staff at the Agere Systems Telecom & Enterprise Networking Division. Lauren can be reached at: firstname.lastname@example.org