Multicore networking in Linux user space with no performance overhead
In this Product How-To Design article, the Freescale authors discuss multicore network SoCs and how to leverage them efficiently for data path processing, the limitations of current software programming models, and how to use the VortiQa zero-overhead user space software framework in designs based on the QorIQ processor family.
System-on-chip architectures incorporating multiple general purpose CPU cores along with specialized accelerators have become increasingly common in the networking and communications industry.
These multi-core SoCs are used in network equipment including layer 2/3 switches and routers, load balancing devices, wireless base stations, and security appliances, among others. The network equipment vendors have traditionally used ASICs or network processors for datapath processing but are migrating to multi-core SoCs.
Multi-core SoCs offer high performance and scalability, and include multiple general purpose cores and acceleration engines with in-chip distribution of workloads. However, exploiting their capabilities requires intimate knowledge of SoC hardware and deep software expertise.
In this article we discuss multi-core SoC capabilities and how to leverage these capabilities efficiently for data path processing, limitations of current software programming models, and finally discuss a zero-overhead user space software framework.
Multicore SoC Hardware Elements
As shown in Figure 1 below a multicore SoC has multiple general purpose cores that run application software. It has hardware units that assist with data path acceleration. Incoming packets are usually directed toward the general purpose cores, where application processing takes place.
Click on image to enlarge.
Application cores make use of hardware accelerator engines to offload standard processing functions. Implementing networking applications on multi-core SoCs need certain basic requirements to be met by the SoC.
1. Partitioning: the SoC must provide the flexibility to partition available general purpose cores to run multiple application modules, or even different applications
2. Parsing, classification and distribution: Once partitioned, there must be flexibility and intelligence in the hardware to parse and classify incoming packets, and then direct them to appropriate partitions and/or cores.
3. Queuing and scheduling: When parsing is completed, the parsing unit must have a mechanism to direct the packet, and also for the system to have a mechanism to direct that incoming packet to a desired processing unit or core. This requires a queuing and scheduling unit within the hardware.
4. Look-aside processing: The queuing & scheduling unit must manage the flow of packets between cores and acceleration engines. Cryptography, pattern matching, compression/ de-compression, de-duplication, timer management, and protocol processing (IPSec, SSL, PDCP etc.) are some standard examples of acceleration units in multicore SoCs.
5. Egress processing: The queuing & scheduling unit must direct the packets to their interface destinations at very high rate Here QoS algorithms for shaping and congestion avoidance are required to offload these standard tasks from application cores.
6. Buffer management: Packet buffers need to be allocated by hardware, and often freed by hardware as packets leave the SoC. Therefore hardware packet buffer pool managers are a necessity.
7. Interfaces to cores: The multi-core SoC architecture need to present a unified interface to the cores, to work with packet processing units.
8. Semi-autonomous processing: Semi-autonomous processing of flows without intervention from cores is desired to offload some processing tasks from the cores. A few multi-core SoCs provide programmable micro engines to enable ingress acceleration on the incoming packets, to do functions such as IP reassembly, TCP LRO or IPsec, before packets are given to the cores.