Making packet processing more efficient with a network-optimized multicore design: Part 1
With the advent of the latest generation of multi-core processors it has become feasible from the performance as well as from the power consumption point of view to build complete packet processing applications using general purpose architecture processors, rather than dedicated ASIC and ASSP SoCs.
Architects and developers in the industry are now considering these processors as an attractive choice for implementing a wide range of networking applications, as performance levels that could previously be obtained only with network processors (NPUs) or ASICs can now also be achieved with multi-core architecture processors, but without incurring the disadvantages of the former.
Why multicore?
Ideally, a single core processor should be powerful enough to handle all the application processing. However, a single core cannot keep up with the constant demand for ever increased computing performance.
The impact of improving the core internal architecture or moving to the latest manufacturing process is limited. Higher clock frequencies also results in considerably higher energy consumption and further increase in the processor-memory frequency gap.
However, with the advent of the latest multi-core processors, it has become feasible from the performance as well as from the power consumption point of view to build complete packet processing applications using general purpose architecture processors.
Architects and developers in the industry are now considering these processors as an attractive choice for implementing a wide range of networking applications, as performance levels that could previously be obtained only with network processors (NPUs) or ASICs can now also be achieved with multi-core architecture processors, but without incurring the disadvantages of the former.
Control plane versus data plane: core partitioning
In such multicore based networking applications, the data plane, also called the forwarding plane or the fast path, handles the bulk of the incoming traffic that enters the current network node.
It is processed according to the rules identified during the classification stage and is sent back to the network. The packet processing pipeline typically includes stages like parsing, classification, policing, forwarding, editing, queuing and scheduling.
In terms of computing, the data plane is synonymous with real time packet processing. The real time constraints are due to the fact that the amount of processing applied per packet needs to fit into the packet budget, which is a direct consequence of the input packet rate.
In other words, each stage along the pipeline must apply its processing on the current packet before the next packet in the input stream arrives; if this timing constraint is not met, then the processor must start dropping packets to reduce the input rate up to the rate it can sustain.
Due to the tight packet budget constraints, the processing applied per packet needs to be straightforward and deterministic upfront. The number of different branches which can be pursued during execution should be minimized, so that the processing is quasi-identical for each input packet. The algorithm should be optimized for the critical path, which should be identified as the path taken by the majority of the incoming packets.
In contrast with the data plane, the control plane is responsible for handling the overhead packets used to relay control information between the network nodes. The control plane packets destined to the current node are extracted from the input stream and consumed locally, as opposed to the bulk of the traffic which is returned back to the network.
The reception of such packets is a rare event when compared with the reception of user packets (this is why the control plane packets are also called exception packets), so their processing does not have to be real time. When compared to the fast path, the processing applied is complex, as a consequence of the inherent complexity built into the control plane protocol stacks, hence the reference to this path as the slow path.
As the processing requirements associated with the two network planes are so different, it is recommended practice that the cores dedicated to data plane processing be different than those handling the control plane processing. As the application layer requires the same type of non-real time processing as the control plane, it usually shares the same cores with the latter.
When the same cores handle both the data plane and the control plane/application layer processing, a negative impact on both may be observed. If higher priority is given to the control plane against the data plane, the handling of the input packets is delayed, which leads to lengthy packet queues as result of network interfaces keeping them well supplied with packets received from the network, which eventually ends up in congestion and packet discards.
If instead the data plane gets higher priority than the control plane, then the delay incurred in handling the hardware events (e.g. link up/down) or the control plane indications (e.g. route add/delete) results in analyzing them when they are already obsolete (the link that was previously reported down might be up by now).
This behavior usually has an impact on the overall quality of the system (packets are still getting discarded, although a route for them is pending for addition) and results in a non-deterministic system with hidden stability flaws.


Loading comments... Write a comment