Making packet processing more efficient with network-optimized multicore designs: Part 3 - Embedded.com

Making packet processing more efficient with network-optimized multicore designs: Part 3

This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.

To address some of the packet processing issues described in Part 1 and Part 2 in this series, it has become feasible from the performance as well as from the power consumption point of view to build complete packet processing applications using the general purpose Intel multicore architecture processors.

Architects and developers in the industry are now considering these processors as an attractive choice for implementing a wide range of networking applications, as performance levels that could previously be obtained only with network processors (NPUs) or ASICs can now also be achieved with multi-core Intel architecture processors, but without incurring the disadvantages of the former.

Implementing packet processing applications on the Intel multi-core processors enables the reuse of the extensive code base already developed for the Intel architecture processors including BIOS, all the major Operating Systems, libraries and applications. The same is true for the mature software development tools already in place for the Intel architecture processors.

Developing packet processing applications on Intel architecture would allow the reuse of the important assets of developer skills and knowledge base on the Intel architecture processors.

As opposed to NPUs, the software engineers are not required to learn a special purpose programming model and tools for packet processing; instead, they can continue to use the same architecture and tools they are comfortable with and are most productive with.

As opposed to NPUs which often use special purpose instruction sets, the IA processors have a general purpose architecture and instruction set, which represents the key advantage responsible for their programmability.

As result of this, the multi-core Intel' architecture processors allow the implementation of highly programmable control and data planes, with fewer constraints than the NPUs.

The multi-core Intel architecture processors offer scalability through software for the control and data plane processing, which is the way to go for networking applications relying on protocols and standards in a continuous evolution.

Unlike NPUs, the multi-core Intel' architecture processors do not rely on expensive resources that are difficult to scale up, like on-chip multi-port memory, CAM memory, large external SRAM memory, etc.

Using Intel Dual-/Quad-core CPUs for packet processing Ideally, a single core processor should be powerful enough to handle all the application processing. However, a single core cannot keep up with the constant demand for ever increased computing performance.

The impact of improving the core internal architecture or moving to the latest manufacturing process is limited. Higher clock frequencies also results in considerably higher energy consumption and further increase in the processor-memory frequency gap.

A way to move forward to continue delivering more energy-efficient computing power is to make use of the advantages of parallel processing. In fact, Intel multi-core chips deliver significantly more performance while consuming less energy. This approach is highly effective for applications such as packet processing.

The latest multi-core Intel processors are represented by the Intel Core i7 processor family which uses the Intel Hyper-Threading Technology to combine the advantages of multi-processing with those of multi-threading.Table 2 below summarizes the main features of the latest Intel multicore processors.

Table 2. Comparison between selected multi-core Intel processors

The multi-core Intel architecture processors are designed to serve as vehicles for the development of complete networking applications. The full processing required by the control plane as well as the data plane can be implemented on the same multi-core Intel architecture chip, although in terms of computing the two network planes have completely opposite sets of requirements.

It is standard practice to have the cores allocated to control plane/application layer running under the control of an operating system, as these tasks do not have any real time constraints attached to them with regard to packet processing. In fact, the complex processing which has to be applied and the need to reuse the existing code base make the interaction with the OS a prerequisite.

Historically, the Intel architecture cores have benefited from excellent support from all the major OS vendors and their programmability and computing performance have recommended them as an excellent choice for the development of complex applications.

Using Intel QuickAssist Technology
To facilitate the integration of the various solutions for specialized Accelerators (the topic of Part 2 in this series ) currently offered by the industry with the Intel architecture processors, Intel has introduced the Intel QuickAssist Technology.

It is a design framework that standardizes both hardware and software interfaces to enable portability from one hardware platform to another while requiring minimal modifications to the application software.

On the hardware side, this framework specifies the interfaces for connecting the accelerators as IP blocks on the same die with Intel architecture cores, as well as connecting external accelerators with the Intel architecture processors through the PCI Express, Front Side Bus or Intel QuickPath Interconnect interfaces.

From the software perspective, the well defined APIs have the role of abstracting the underlying hardware implementation of accelerators, including the connectivity method to the Intel architecture processors, so that upgrading to a new platform does not impact the software application.

The first Intel architecture chip that is Intel QuickAssist Technology enabled is the Intel EP80579 Integrated Processor, which provides accelerated on-chip support for encryption/decryption, as well as other packet and voice processing tasks.

To read Part 1 , go to Pipelining versus clustering cores for networking.
To read Part 2 , go to Minimizing and hiding latency.

Cristian F. Dumitrescu is a Senior Software Engineer with the Embedded and Communications Group at Intel. He is the author of Design Patterns for Packet Processing Applications on Multicore Intel Architecture Processors from which this article is derived. He has worked extensively in the past with network and communications processors from AMCC, C-Port, Freescale and Intel. He is currently focusing on delivering packet processing performance on multi-core Intel Architecture processors.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.