Multicore networking in Linux user space with no performance overhead

Dronamraju Subramanyam, John Rekesh and Srini Addepalli, Freescale Semiconductor

February 26, 2012

Dronamraju Subramanyam, John Rekesh and Srini Addepalli, Freescale Semiconductor

Data plan processing on the network
Data plane processing in different network devices tends to use similar types of operations. Multicore SoCs accelerate and substantially improve performance of data plane processing, by providing mechanisms that address common data path processing elements. Typical data plane processing involves steps from ingress to egress, as illustrated in Figure 4 below.


Figure 4. In the typical network, data plane processing involves execution of multiple steps from ingress to egress

Packet ingress involves parsing, classification and activating the right application module to handle the packet. This is now facilitated in hardware, such as using a parse/classify/distribute unit. Packet (protocol) integrity checks may also be conducted at this stage.

The next step is core-based packet processing, by locating the context or flow associated with the packet, within the data plane. Much of policy related processing by application modules need not happen per packet. Instead, only the first (or a few) packets of a flow need to be processed thus in many cases.

When a flow context is not found in the data plane, the packet is sent to the control plane for policy lookup and enforcement. If policy allows, the control plane creates a flow context within the data plane. Further packets of the flow are matched against this context and are processed fully within the data plane.

A flow is typically defined by an N-tuple, which are fields extracted from the packet. A hash table lookup using these fields is the most common implementation of a flow lookup, to find its context. Both the extraction of necessary fields and the required hash computation can be offloaded to the hardware parsing unit of an SoC.

Within data plane processing stages there can be multiple application modules that need to process the packet in sequence. Each of these modules in the data plane may have its own control plane module that handles application specific flows.

An efficient communication mechanism between data plane and control plane modules is therefore required. This is essentially a core-to-core communication mechanism, facilitated by the hardware.

Each application module (that involves standard protocols) may implement some standard processing algorithms. Many of these algorithms, methods and even protocols are common enough to be implemented in look-aside hardware accelerators. An application module can then make use of these accelerators during appropriate stages of its own processing, by directing packets to those engines, and collecting responses.

One thing common to all data plane processing is handling of statistics. Statistics counters such as byte and packet counters and application specific counters often need to be kept per flow, and also per higher abstraction levels as required by applications. Therefore a large number of counters can be expected in higher end devices.

Since multi-core synchronized access to shared counters are costly, a multi-core SoC can also provide a statistics acceleration mechanism – one that would make incrementing statistics for millions of counters very efficient.

Once a packet is processed through all necessary application modules in the data plane, the packet is sent out to the egress interface. Typical processing here requires scheduling and shaping (or rate limiting). Since standard QoS algorithms are generally used, these functions can also be offloaded to hardware units, so that the application modules need only enqueue packets to egress processing units.

< Previous
Page 3 of 5
Next >

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER