Designing a QDR-IV SRAM-based statistics counter IP for network traffic management
Network routers maintain statistics counters for monitoring performance, traffic management, network tracing, and security. Counters keep track of the number of arrivals and departures of a packet type, and they count specific events such as when the network drops a packet. A packet’s arrival can lead to the updating of several different statistics counters; however, the number of statistics counters in a network device and their rate of update are often limited by memory technology.
Managing statistics counters requires high-performance memories that can accommodate multiple read-modify-write operations. This article describes a unique implementation of statistics counters using an IP-based approach that can be interfaced with a network processor unit (NPU) on one side and Xilinx’s QDR-IV memory controller on the other. The QDR-IV Statistics Counter IP is a soft IP that along with QDR-IV SRAM provides an efficient statistics counters for network traffic management and other counter applications.
QDR-IV SRAM Overview
QDR-IV SRAM has two bi-directional data ports A and B which can undertake two data WRITEs or two data READs or a combination of READ and WRITE per clock cycle. This gives the user the added flexibility of using the QDR-IV SRAM in applications where R/W ratios are not necessarily balanced. Each port allows data transfers on both clock edges (DDR operation) and is burst-oriented with a burst length of two words (each word is X18 or X36) per clock cycle. The address bus is common and also supports double data rate with the rising and falling edges providing the address for port A and B respectively. Depending upon the manufacturer, the QDR-IV SRAM may also support embedded error correction code (ECC) to virtually eliminate soft errors and improve the reliability of the memory array.
QDR-IV SRAMs are available in two flavors: QDR-IV High Performance (HP) and QDR-IV Xtreme Performance (XP). The HP device can operate at a maximum frequency of 667 MHz while the XP device can operate up to 1066 MHz. QDR-IV XP is able to increase the performance by dividing the memory space into 8 banks, represented by the 3 LSB bits of the address. The required access scheme is that different banks are accessed in the same cycle. From cycle to cycle any bank can be accessed. System designers understand this and are able to utilize the full random transaction rate (RTR) performance of XP devices by planning the system architecture to allocate the bank addresses. In this way, developers can reduce overall system cost while boosting performance.
Statistics Counter IP
The QDR IV Statistics Counter is a soft IP that along with QDR-IV SRAM provides statistics counters for network traffic management and other counter applications. The IP implements read-modify-write logic with support for a system management access port. The IP can be interfaced with a NPU on one side and a QDR-IV memory controller on the other. As the statistics counter supports line card rates of 400Gbps and beyond, performance is only limited by the FPGA and QDR-IV device used.
Statistics Counter IP Operation
Figure 1 below shows an example implementation using QDR-IV and the Statistics Counter IP. The Statistics (STATS) update request is sent from a typical NPU at the rate of 800M counter pair updates/second. Each of these STATs requests contain command tokens for either ingress/egress packet that maintain two counters (Packet and Byte count) in a single 72-bit word. At every 1-second interval, the entire counter cache data is updated into lifetime counters maintained in the system memory (usually DRAM). This read-back access from the NPU is termed a Processor (PROCS) update request. The PCIe interface is used to transfer the counter cache data to update the lifetime counters. The block diagram below shows the setup with STATS IP and QDR-IV memory interfaced with Xilinx Memory Controller, PCIe bus, and NPU.
click for larger image
Figure 1: Complete Infrastructure with STATS IP, NPU and Memory (Source: Cypress)
The STATS IP is designed to work with both HP and XP QDR-IV memories. The mode of operation is controlled through a single parameter present at the top-level interface of the IP design. Two counters (packet and byte) are implemented in a single 72-bit word per flow address. Four million counters are supported in one 144Mb QDR-IV SRAM. The instances of IP required in the design are equivalent to number of QDR-IV SRAMs used.
As shown in the block diagram, the NPU pushes the STATS and PROCS requests into the IP through 4x25Gbps links. The IP operates at one-fourth the frequency at which the memory is accessed and uses four parallel data paths – also called channels – to match the memory bandwidth. In both HP and XP mode of operation at the memory interface, Port A is used as a read port and Port B is used as a write port. Each STATS request performs a Read-Modify-Write to the counter data stored in a unique memory location associated with that request.
The read and write requests are staged to delay match with a read latency of QDR-IV memory along with memory controller latency. The staging design is also used as a local cache to accumulate service update requests during the latency window. In HP mode, there is no restriction on the STATS/PROCS update addresses pushed through each of the four channels. The occurrence of an address can be random, and there is no need to assign a particular address type to each channel. However, due to banking structure of memory and restrictions associated with it in XP mode, Channels 0 and 1 are assigned to odd address locations where ingress flow data is stored and Channels 2 and 3 are assigned to even address locations where egress flow data is stored. This unique arrangement prevents banking restriction stalls that can otherwise happen in XP mode.
A one second read-back request from the processor is common to both modes of operation. The entire memory location needs to be read back in one second intervals such that the PROCS request is not continuous and is spread throughout the one-second time interval. This event resets the memory location after each read through PROCS request.