Network routers maintain statistics counters for monitoring performance, traffic management, network tracing, and security. Counters keep track of the number of arrivals and departures of a packet type, and they count specific events such as when the network drops a packet. A packet’s arrival can lead to the updating of several different statistics counters; however, the number of statistics counters in a network device and their rate of update are often limited by memory technology.
Managing statistics counters requires high-performance memories that can accommodate multiple read-modify-write operations. This article describes a unique implementation of statistics counters using an IP-based approach that can be interfaced with a network processor unit (NPU) on one side and Xilinx’s QDR-IV memory controller on the other. The QDR-IV Statistics Counter IP is a soft IP that along with QDR-IV SRAM provides an efficient statistics counters for network traffic management and other counter applications.
QDR-IV SRAM Overview
QDR-IV SRAM has two bi-directional data ports A and B which can undertake two data WRITEs or two data READs or a combination of READ and WRITE per clock cycle. This gives the user the added flexibility of using the QDR-IV SRAM in applications where R/W ratios are not necessarily balanced. Each port allows data transfers on both clock edges (DDR operation) and is burst-oriented with a burst length of two words (each word is X18 or X36) per clock cycle. The address bus is common and also supports double data rate with the rising and falling edges providing the address for port A and B respectively. Depending upon the manufacturer, the QDR-IV SRAM may also support embedded error correction code (ECC) to virtually eliminate soft errors and improve the reliability of the memory array.
QDR-IV SRAMs are available in two flavors: QDR-IV High Performance (HP) and QDR-IV Xtreme Performance (XP). The HP device can operate at a maximum frequency of 667 MHz while the XP device can operate up to 1066 MHz. QDR-IV XP is able to increase the performance by dividing the memory space into 8 banks, represented by the 3 LSB bits of the address. The required access scheme is that different banks are accessed in the same cycle. From cycle to cycle any bank can be accessed. System designers understand this and are able to utilize the full random transaction rate (RTR) performance of XP devices by planning the system architecture to allocate the bank addresses. In this way, developers can reduce overall system cost while boosting performance.
Statistics Counter IP
The QDR IV Statistics Counter is a soft IP that along with QDR-IV SRAM provides statistics counters for network traffic management and other counter applications. The IP implements read-modify-write logic with support for a system management access port. The IP can be interfaced with a NPU on one side and a QDR-IV memory controller on the other. As the statistics counter supports line card rates of 400Gbps and beyond, performance is only limited by the FPGA and QDR-IV device used.
Statistics Counter IP Operation
Figure 1 below shows an example implementation using QDR-IV and the Statistics Counter IP. The Statistics (STATS) update request is sent from a typical NPU at the rate of 800M counter pair updates/second. Each of these STATs requests contain command tokens for either ingress/egress packet that maintain two counters (Packet and Byte count) in a single 72-bit word. At every 1-second interval, the entire counter cache data is updated into lifetime counters maintained in the system memory (usually DRAM). This read-back access from the NPU is termed a Processor (PROCS) update request. The PCIe interface is used to transfer the counter cache data to update the lifetime counters. The block diagram below shows the setup with STATS IP and QDR-IV memory interfaced with Xilinx Memory Controller, PCIe bus, and NPU.
click for larger image
Figure 1: Complete Infrastructure with STATS IP, NPU and Memory (Source: Cypress)
The STATS IP is designed to work with both HP and XP QDR-IV memories. The mode of operation is controlled through a single parameter present at the top-level interface of the IP design. Two counters (packet and byte) are implemented in a single 72-bit word per flow address. Four million counters are supported in one 144Mb QDR-IV SRAM. The instances of IP required in the design are equivalent to number of QDR-IV SRAMs used.
As shown in the block diagram, the NPU pushes the STATS and PROCS requests into the IP through 4x25Gbps links. The IP operates at one-fourth the frequency at which the memory is accessed and uses four parallel data paths – also called channels – to match the memory bandwidth. In both HP and XP mode of operation at the memory interface, Port A is used as a read port and Port B is used as a write port. Each STATS request performs a Read-Modify-Write to the counter data stored in a unique memory location associated with that request.
The read and write requests are staged to delay match with a read latency of QDR-IV memory along with memory controller latency. The staging design is also used as a local cache to accumulate service update requests during the latency window. In HP mode, there is no restriction on the STATS/PROCS update addresses pushed through each of the four channels. The occurrence of an address can be random, and there is no need to assign a particular address type to each channel. However, due to banking structure of memory and restrictions associated with it in XP mode, Channels 0 and 1 are assigned to odd address locations where ingress flow data is stored and Channels 2 and 3 are assigned to even address locations where egress flow data is stored. This unique arrangement prevents banking restriction stalls that can otherwise happen in XP mode.
A one second read-back request from the processor is common to both modes of operation. The entire memory location needs to be read back in one second intervals such that the PROCS request is not continuous and is spread throughout the one-second time interval. This event resets the memory location after each read through PROCS request.
The STATS_IP architecture block diagram shown in Figure 2 indicates the presence of three sub-components, namely the Same-Address-Compare-Pipe (SACOMP) block, per channel Request-Mux-Demux(REQ_MXDMX_CHn) block, and four instances of the A-B-Channel-pair-counter-logic (ABCH_CTRL_CHn) block, one per QDR IV application channel.
click for larger image
Figure 2: STATS-IP Architecture (Source: Cypress)
The SACOMP block contains two pipe stages, one for comparing to compress the same address on all four channels on the same clock cycle (SACOMP_ChN-to-All) and the second is back-to-back (Burst of 2) comparison for the same address on a single channel (SACOMP_B2BChN). When two or more channels have same addresses at a given instance of time, the channel with the highest priority is accumulated with the data present on channels carrying the same address, and all the other channels with lower priority having the same address are invalidated. This ensures that a single STATS request covers all the channels with same address, thus preventing any chance of data coherency problem. The comparison and accumulation algorithm ensures a fast evaluation for all the possible cases. Also, on a single channel, if there is a back-to-back STATS update request for the same memory location, then the latest request is invalidated and its data is accumulated with the request which occurred before it. This ensures that any back-to-back access meant for the same address location does not occur within the read latency defined by QDR-IV memory device.
The Request-Mux-Demux (RQMXDMXChn) block, shown in Figure 2, receives PROCS update requests and STATS update requests for the corresponding channel N. As PROCS update requests are expected at a periodic interval, RQMXDMXchN selects the corresponding PROCS request for the next clock service and stalls the STATS request through a ‘request-ready’ back pressure signal. After servicing each one-second update request, the RQMXDMXchN stalls the PROCS request channel for a configured number of clocks (default of 10) to ensure that back-to-back one-second updates will not be serviced. The stall signal is directed towards the NPU, ensuring that no new requests are generated until the back pressure signal is deasserted. This mechanism ensures that both STATS and PROCS requests are processed without choking the design.
The final stage, A-B-Channel Pair-Counter Logic (ABCH_CTRLn) implements the actual Read-Modify-Write mechanism for every STATS request and also ensures the latest read for every PROCS request. This stage comprises read-delay pipe logic, a control mux, a write pipe, and QDR-IV controller interface logic. The read delay pipe accounts for both memory and controller latency. The feedback mechanism from adjacent channels and from the control mux of same channel eliminates all possibilities of data coherency problems. The request flow in the pipe attributes to both STATS and PROCS updates. The control mux (CTRL_MUX) block differentiates between a PROCS and STATS request so as to allow the current data to be redirected as PROCS read or pushed into the controller interface block for further processing. The controller interface block converts the actual write and read requests into controller specific commands on port A and port B.
QDR-IV Interface Operation and Application Channel Mapping
The STATS stream quad-channel and one-second (PROCS) update quad channel interfaces serve as application channels. The QDR IV controller implements 4:1/1:4 channel mux/demux function with a dedicated four-channel port interface, defined for QDR IV port A and port B. The QDR-IV controller is assumed to mux and demux the channels in a fixed sequence – ch0, ch1, ch2 and ch3 between the application side and the QDR IV device side that is operating at 4X clock rate. The assumed QDR-IV controller channel sequence and the recommended application channel mapping for the QDR IV HP based STATS counter solution and QDR IV XP based STATS counter solution are illustrated in Figure 3.
click for larger image
Figure 3 QDR-IV interface 4:1/1:4 mux/demux and application channel mapping (Source: Cypress)
In the QDR-IV HP-based STATS implementation, Port A and Port B are populated with the requests that are independent of the addresses and follow the order Ch0-Ch1-Ch2-Ch3. This is because there is no banking requirement in HP mode and the request present on Port A and Port B might contain addresses belonging to same location. However, in a QDR-IV XP-based STATS implementation, Port A and Port B are populated with requests that follow the order Odd-Even-Odd-Even, such that no two addresses belonging to the same bank appear on Port A and Port B in the same clock cycle.
The QDR IV Statistics Counter IP along with QDR-IV SRAM, provides an efficient approach to statistics counters for network traffic management and other counter applications. Learn more about QDR-IV SRAM at http://www.cypress.com/search/all/QDR-iv.
Avi Avanindra is Director of Systems Engineering for the Memory Products Division at Cypress, where he helps customers develop memory solutions for embedded systems. A graduate of Iowa State University with a degree in Electrical and Computer Engineering, he previously worked at Cisco Systems designing ASICs for switches and routers.
Devardhi “Dev” Mandya is a Senior Systems Engineer with the Systems Engineering Team (Memory Products Division) at Cypress. He has close to 5 years of industry experience and has been with Cypress for the past 3 years. Dev holds a Master of Science degree in Electrical Engineering from the University of Southern California, Los Angeles.