Developers must balance NPU programmablity and performance issues -

Developers must balance NPU programmablity and performance issues


Developing a really bad case of “dj vu” right now are developers in the newly emerging market for embedded networking who were working in the decade just after the introduction of the microprocessor in the early 1970s.

For, in many respects the struggles of the architects of the new network processors, and of the engineers who must program and design with them, are a repeat of the turmoil, confusion and indecision that typified the new markets in the 1970s into those new things called microprocessors and microcontrollers were being introduced.

Faced with a totally new approach to implementing their electromechanical control designs, developers had to answer innumerable questions about processor architecture, languages, tools, and finding a balance between sufficiently high performance on the one hand, and flexibility, extensibility and cost effective design on the other.

In the present embedded networking environment of 2002, the same issues are facing developers but with several levels of complexity thrown in. First, unlike the early days of embedded design in which the product development times were a year or more and the lifetime of a product could be measured in decades, now those times have been reduced to months and a few years, respectively. Second, the performance range over which these embedded products runs from a few hundred thousand kilobits per second at the network edge to 1, 2 and 10 Gigabits per second at the core. Third, there are numerous protocols that must be administered and managed, many of which are still in a state of change.

Finally, there is the sheer complexity of the job that these NPUs must perform down in the heart of the switches, routers, bridges and gateways. To bring some sort of coherence to the diverse processing chores, architects and standards bodies have partitioned this application space into six somewhat overlapping planes, or areas of processing specificity, each of which are far beyond the complexity of many existing applications and which together present a daunting challenge.

At the core is the data plane, where the main job of the processor is to pull packets in and push them out of the switch or router as fast as possible. Around this are at least five other processing planes for adaptation, switching, control, applications, and management.

The adaptation plane is concerned with network interfaces to various client types at the network edge involving transcoding, translation, firewalling and filtering.

The switching plane is involved in making the connections between the various media streams being transported and the various logical ports to which they should be deliverd. The control plane manages the routing of the traffic that the data plane is pushing through the system, allocating resources, setting up and tearing down connections, what level of quality of service, and which protocols to initiate.

The application plane is concerned with implementation of new services and the coexistence of these services in the same processor environment. Finally, there is the management plane which configures all of the other planes to create the right mix of services as well as handle fault, performance, security and accounting management as well as service creation.

The decisions facing the developers in this admittedly much more complex environment are the same: Do I choose an ASIC or ASAP solution specific to the application to get the best performance? Or, do I chose a more programmable solution to reduce development time and to spread out the investment in hardware and software tools and education over a number of different designs? What is the most appropriate balance between programmability and performance? What is the best programming model? What are the hardware and software tools available? Can I adapt existing tools or, if new tools are required, what is the “time to functionality” for the designer assigned.

In the series of articles in this report, a number of authors from Intel Corp.'s Network Processing Division analyze the requirements of the network processing, the general architectural requirements, the degree of programmability, the programming model and the types of support tools that will be needed.

Out of the extensive analysis of this new marketplace for processor architectures, Intel came to the conclusion, said Johnathan Corgan, senior product manager in Intel's network processor division, that a software programmable solution to the requirements was the most viable approach. Out of that emerged the new IXP2800 general-purpose network processor designed for high-speed packet processing.

Each processing element (PE) microengine, is multi-threaded with non-pre-emptive context switching, and where the microengines are programmable through the use of a set of tools that allow developers to write code in C which is then compiled the machine code that actually runs on the NPEs. Combined with this on the chip is a StrongARM-dervived version of the X-scale to handle the management and control plane functions using normal programming techniques.

The IXP2800 offers a variety of integrated hardware features that address the problems of resource sharing and communication within a multi-processor device. Each microengine contains hardware acceleration features such as local memory and a CAM for data caching. The internal and external memory units support logical and arithmetic atomic operations, as well as hardware queue management. The processor also has features for low-latency inter-process communication between microengines and data coherency features within its memory controllers.

According to Corgan, Intel has also made allowances for the fact that at the present stage in network processor evolution, some functions remain the domain of special-purpose plane-specific co-processors and ASICs. To that end, he says, the IXP2800 supports the industry standard SPI-4.2 protocol and the Network Processing Forum's Look-Aside Interface Specification (LA-1) using the QDR SRAM standard.

Acclerating algorithms

Rather than a software programmable core around which specialized functions are added or a dedicated solution focused on one segment, said Corgan, “we believe that hardware and technology resources should be devoted to accelerating the software algorithms to do all of the necessary compute functions in this environment.” This provides a scalability across performance ranges that allow developers to use the same architecture on a number of targeted applications, lowering overall component cost and spreading out development costs over a wider range. It also provides, he said, scalability in the planar direction, allowing integration of some functions in other planes into the NPU.

“While initially versions of the company's new IXP2800 family are focused primarily at accelerating the data plane and leaving other aspects to either ASICs or dedicated function solutions,” he said, “we believe a well defined and thought out architecture will be able to incorporate most of that functionality onto a single chip as process technology improves and as the functional partitions stabilize.”

Many network processor vendors and their software providers generally agree with many of the points raised by the Intel contributors in their analysis of the network processing market and its requirement. Where they differ is in implementation, the degree and balance of general purpose versus special purpose and ASIC and the degree of programmability and configurablity.

According to Jon Kenton, telecom marketing manager at the Motorola Computer Group, (Tempe, Ariz.), his company's family of network processors emphasizes a general purpose approach as well, but not to the same degree or in the same way that Intel has done so. In it's C-Port family, the company uses multiple standard processor engines based on MIPS and PowerPC.

Perhaps reflecting its origins in microcontrollers, the company supports these specialized engines with on chip peripheral functions appropriate to the segment of the network processing market it is targeting with a specific member of the family. “We are trying to stick as closely as possible to most of the standard development tools used in the embedded industry,” he said. To hide the parallelism inherent in most network processing architectures, the company has developed a proprietary API and targeted compilers that hide most parallelism so as to allow developers to continue to do code development in the traditional sequential programming mode.

At PMC-Sierra Inc., (Santa Clara, Calif.) which has an architecture based on multiple instantiations of a proprietary licensed version of the MIPs architecture, the approach they have taken reflects that of network companies such as Cisco. “Everyone forgets that before ASIC-based NPUs the way networking guys solved these problems with general purpose CPUs with millions of lines of proprietary code and network optimized OSes,” said Tom Riordan, vice president and manager of the MIPS Processor Division. “Someone like Cisco is going to do as much as possible to get everything out of that software investment by sticking with standard architectures, such as MIPs and turning to external ASICS and dedicated ASSPs only when they have to.”

There is common agreement that that just as important to the success of a network processor's performance and hardware features is the range and flexibility of the tools available to developer. The debate, said David Stepner, CEO at Teja Technologies, Inc. (San Jose, Calif.) is going to be about the kind of tools and languages that will become common and useful in this environment. “Network companies are stuck in a hard place,” he said. “On the one hand they want as much specificity as possible in the processor, but also as much programmability as they can get.”

According to Michael Selissen, technical marketing engineer in the Network Processor Division, Intel Corp., (Tempe, Ariz.), the choice of the software support will be as important as the specific architecture. ” To realize its potential benefits, designers need to choose a network processor based not only on whether it meets initial product requirements, but whether it has the headroom to allow the application to grow,” he said. “Some networking and telecommunication manufacturers will clearly gain from using fixed-function components, such as classifiers and traffic managers, found in some network processor architectures. Others, however, have developed their own algorithms for classification, table searching and traffic scheduling. For these manufacturers, a general-purpose processor allows them to maintain the investment in their proprietary technology as they evolve to new designs.”

Larry Huston, principal software architect, Network Processor Division, Intel Corp. (Tempe, Ariz.) believes the embedded developer should also take care in assessing the programming models that an architecture supports. “One of the most attractive things about software programmable network processors as an alternative to ASICs for designing networking equipment, is that they allow additional features to be added and bugs to be fixed through software changes instead of hardware modifications, ” he said. “But to compete with ASICs, network processors need to process data at high rates, and meet the accompanying requirements for fast I/O and memory operations.

Many network processors address these challenges by using techniques for hiding memory latency, special hardware units to offload common functions, and a high degree of parallelism, said Houston, which means that no one programming model may be adequate. ” When choosing a programming model, it is important to consider the pros and cons of each programming model and match these with the characteristics of the application. In our experience we have found that different portions of an application have different characteristics and that mixing the programming models may give the best results.”

The developer must also consider how efficiently a programming model can be implemented on the network processor that has been selected, he said, because without sufficient hardware support, a given programming model may be too expensive to implement.

Rather than get into issues of language, programming models, parallel versus signal, the approach that Alex Henderson, CTO, Fast-Chip Inc. (Sunnyvale, Ca.) and his fellow designers used was to start with the standard sets of features, requirements, and inter-relationships that all engineers who have worked in communications know by heart and create a “tablular” format with which developers using plain English and network terminology they are already know and allow them to “program” the company's coprocessor oarchitecture with no more additional knowledge than what it takes to work with a standard spreadsheet or data base.

And, Axel Tillman, chairman & CEO, Novilit, Inc., (Marlborough, Mass.), does not think the debate and uncertainty about ASIC, special purpose dedicated hardware, or fully programmable network processors will end soon. “It is going to be a tough problem for some time to find the right balance between the specificity you need for high performance and the programmability you need for to reduce costs and extend development over a wider range of performance points,” he said, “and will not disappear until a common methodology appropriate to network processing emerges.”

The approach he thinks has the most potential is the “specification language” methodology in which the language constructs implicitly reflect the application environment, allowing a developer to either optimize his existing standard architecture or create the code from which an specialized processor or ASIC can be generated.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.