100GbE and greater density line cards can help provide the network bandwidth demanded by video traffic and mobile backhaul applications. Here is the latest status of 40GbE/100GbE standards and the system interconnects required for line cards with densities of 100 Gbps and greater.
It has been widely reported and covered that networking bandwidth more than doubles every couple of years. Bandwidth drivers include the movement of video files, increased data rates in commercial and residential use, social networking, and greater data rates amongst mobiles users.
The major markets driving greater bandwidth include data centers and mobile backhaul. Instead of using standalone servers, modern data centers are predominantly shifting to racks of servers and, in some cases, to blade servers. Typically in a 1U (3.75-inches tall) or 2U (7 inches) form factor, these servers are mounted in a large rack. Each server may have one connection for Ethernet, one for Fibre Channel (FC), and possibly one for InfiniBand or another HPC (high-performance computing) interconnect. An Ethernet switch at the top of the rack is used for connections among the servers and to the aggregation switch. The connections rates are shifting to 10 Gbps, driving the market for 10 GbE, which we project to exceed four million switch ports in 2010.
An increasing number of cellular subscribers and the greater data bandwidth per subscriber are driving higher bandwidth in the mobile backhaul. Cellular data is backhauled from base stations and remote controllers using various technologies. These technologies include T1 lines, microwave transmission, Ethernet over SONET, Ethernet over OTN (optical transport network), or PON (passive optical transport) networks.
The increasing bandwidth drives greater port density and faster data through the network. For data centers, original equipment manufacturers are developing 48 x 10-GbE port switches, which provide bandwidth of 480 Gbps. In the telecom infrastructure, OEMs are deploying 48 x 10-GbE line cards to provide 480 Gbps bandwidth. These systems aggregate the traffic over high rate links, which can be 40GbE or 100GbE ports. Data-center applications typically use 40 GbE, while telecom applications favor 100GbE. This article provides an overview of the 40/100GbE standards and discusses the typical interconnects in a line card or system for 40/100GbE.
In June 2010, the IEEE approved 802.3ba as the Ethernet specification for operation at 40 Gbps and 100 Gbps. This specification defines the two data rates mapped over fiber or copper media. Figure 1 shows the 40/100GbE layer model and the names of the different standards.
Click on image to enlarge.
At 40 Gbps, 802.3ba defines application-specific physical media attach and physical media dependent PHYs (physical layers). It defines 40 GBase-KR4 for backplanes, 40 GBase-SR4 for short reach over MMF (multimode fiber), 40 GBase-CR4 for direct attach copper, and 40 GBase-LR4 for long reach over single-mode fiber (SMF). The specification includes an auto negotiation (AN) layer to automatically configure the data rate. Similarly, 802.3ba defines specific physical media—attach and physical media—dependent PHYs for 100 Gbps. These PHYs encompass copper direct attach, short reach fiber and long reach fiber. For transmission over copper media, 802.3ba specifies optional FEC (forward error correction).For transmission over the physical media, the specification defines the wavelengths, data rates, number of serial lanes and physical coding. The 100-GBase-SR10 and 40-GBase-SR4 PMDs (physical medium dependent sublayers) use 850-nm wavelengths carrying 10.3125 Gbps over 10 fibers and four fibers, respectively, in each direction. IEEE 802.3ba specifies a distance of 100 m over OM3 fiber for the SR PHYs.
For 100 GBase-LR4 and -ER4, the standard specifies dense wavelength division multiplexing (DWDM) over a pair of SMF. The center frequencies are spaced 800-GHz apart and use lasers optimized for long distance transmission. The specifications defines four wavelengths: 1,295 nm, 1,300 nm, 1,305 nm, and 1,310 nm, which each conveys an effective data rate of 25 Gbps. Consistent with the 10 GBase-LR and -ER standards, 100 GBase-LR4 and -ER4 specify a reach of 10 km and 40 km, respectively.
For all copper cable types covered by the initial standard, 802.3ba specifies twinax cables. These physical-layer specifications target a distance of at least 7 m over copper cables. The specification defines a serial link data rate of 10.3125 Gbps using 10 lanes for 100 Gbps and four lanes for 40 Gbps. Thus, all but two of the 100-Gbps PHYs define a 10-lane interface for the physical coding. The exceptions are the long reach 100-GBase-LR4/ER4 PHYs, which define a 4-lane physical interface that corresponds to the four wavelengths for transmission.
As Figure 1 shows, to connect between the MAC (media access control) and the physical layer, the 40GbE and 100GbE standards use XLGMII and CGMII, respectively. Because this interface is typically integrated in a chip, it's a logical interface that is defined without the full electrical specification. Each interface is defined as a nominally 64-bits-wide data bus at 625 MHz for 40 GbE (XLGMII) and 1.5625 GHz for 100GbE (CGMII). The data bus is segmented into eight lanes, each of which has an associated control signal lane. These specifications are typically provided for intellectual property (IP) vendors developing cores for SoC (system-on-chip) designs. For easier implementations, vendors may scale these interfaces to wider bus widths at slower clock rates.
40/100GbE line card
Figure 2 shows a 40/100GbE conceptual line card, which connects to the optical media for the networks, and the backplane for the system. The main components of the line card are the optical module, PHY, MAC, packet processor, memory, traffic management, and fabric interface. The actual number of components will depend on the specific ASICs selected in the system design. Although all components are shown in line with the data flow, actual designs could have look-aside functions such as a TCAM (ternary CAM) for classification rules.
Click on image to enlarge.
As the figure shows, several chip-to-chip interconnects are used on the line card. Most of these are standard interconnects developed and enhanced by various standards bodies. One exception can be the backplane data interface, which is often proprietary to the OEM. The figure shows the backplane using separate buses for the data plane and control plain. The control plane is at a significantly slower rate than the data plane and is often PCI Express or Ethernet. In the following sections, we'll examine the major data-plane interconnects in more detail.Optical modules
The optical transceiver is typically packaged in a module and mounted at the end of the line card. Optical module requirements created by a group of OEMs and vendors are published as multisource agreements (MSAs). MSAs specify parameters such as package dimensions, thermals, transmission distance, wavelength, and electrical interfaces. The leading optical modules for 40GbE and 100GbE are QSFP+ and CFP (C form-factor pluggable), respectively. These modules are also used in other applications, including InfiniBand and Fibre Channel.
The T11's SFF-8436 defines the QSFP+ (Quad SFP+) module for Ethernet, InfiniBand, and Fibre Channel applications. It can be used for 4 x 10-GbE ports for greater density, 10-GbE line cards or for 40GbE. The module has been specified to support MMF, SMF, direct attach copper, and active cables. Although the QSFP+ specification calls out SMF for long reach, the module is currently deployed only for short reach applications.
The SFF-8436 specification defines a 38 contact electrical connector, management interface to configure and monitor the module and the mechanical specifications for the module and the cage that accepts the module. The power consumption of the module ranges from 1.5 W to 3.5 W depending on the data rate. The QSFP+ module is slightly larger than the SFP+ module but smaller than the XFP module—the latter two are 10-Gbps modules.
In the first quarter of 2009, Avago, Finisar, Opnext, and Sumitomo formally launched the CFP MSA for 40-Gbps and 100-Gbps applications. The CFP may be used for 40/100GbE or for 120-Gbps InfiniBand. The MSA defines the mechanical and electrical specifications, as well as the MDIO (management data input/output) diagnostics interface. CFP is a 148-pin pluggable module that is about twice the size of a Xenpak module and significantly larger than QSFP+ modules. At 120 mm x 86 mm, the CFP module is about the same length and twice the width of a Xenpak module.
Click on image to enlarge.
CFP modules comes in versions that are flat top, with integrated heat sink on top, or with rails on the host card to accommodate a heat sink on top. The MSA vendors built this flexibility to enable vendors to develop CFP modules for different applications, including short reach without a heat sink or long reach with a heat sink.
Physical layer and framing
As Figure 2 shows the optical module is connected to a PHY and a MAC. The optical modules provide a 10-Gbps parallel physical interface (PPI) to connect with the components on the line card. Specifically, the optical module interconnects are referred to as XLPPI for 40 GbE and CPPI for 100GbE. Note that the CFP module requires 10:4 multiplexing chips inside the module to support 100 GBase-LR4/ER4, which uses a 25-Gbps electrical interface to the optical components.
The 802.3ba standard, however, defines XLAUI for 40GbE and CAUI for 100GbE. These are intended to support trace lengths up to 10 inches with one connector. XLAUI and CAUI transfer data at 10.3125 GBaud and are four-lanes and 10-lanes wide, respectively. Thus, a PHY is required to retime the optical modules nPPI to meet Ethernet specifications. Specifically, this PHY retimes XLPPI or CPPI to XLAUI and CAUI, respectively.
The XLAUI and CAUI interfaces connect to the 40GbE and 100GbE MAC, respectively. The MAC features include frame-check-sequence functions, bit-error rate monitors, and statistics collection. The MAC and Physical Coding Sublayer (PCS) functions for 40GbE and 100GbE require data-striping technology, which spreads the high-speed serial data stream across multiple lanes. This multilane distribution (MLD) function uses 20 virtual lanes for 100GbE and four virtual lanes for 40GbE; it also includes deskewing as well as link status monitoring and reporting. The 20 virtual lanes are a result of the 100GbE physical media attach (to the optics), with 10 lanes for 100-GBase-SR10 and four lanes for 100-GBase-LR4/ER4. The data from the four and 10 physical lanes can be divided into 20 virtual lanes. The 20 virtual lanes may be mapped to physical interface widths of four, 10, or 20 channels; while the four virtual lanes of 40GbE may be mapped to physical interface width of four channels.
Packet processing and the backplane
The MAC interfaces to a packet processor, which may be an ASIC NPU (network processing unit), an FPGA such as Altera's Stratix or Xilinix's Virtex, or a merchant NPU such as the EZ chip's NP4 or Xelerated's HX326. With Cisco as a key implementer, Interlaken has become a popular interconnect for interfacing the MAC to a packet processor. In the first quarter of 2007, several companies came together to form the Interlaken Alliance, which is responsible for the specification and manages interoperability testing between the various implementations.
Scaling from 10 to 100 Gbps, Interlaken is a SerDes-based interface, with each lane operating at data rates up to 6.25 Gbps. For better efficiency than XAUI and other interconnects, it uses 64b/67b coding. Interlaken stripes packets as eight-byte words across the number of lanes used in the application. Each lane uses a scrambler to randomize data for recovery at the receiver. If any lane fails, Interlaken will stripe across the remaining lanes. To provide redundancy without a reduction in performance, system designers can provision more lanes than necessary.
Interlaken divides a packet into smaller bursts, delineated by a header. The bursts are striped as words, round robin, across the SerDes lanes. Each burst contains a control word, which is used to indicate the start and end of a packet, for alignment, and to indicate idle messages. It also includes the burst-control word, which can be used for flow-control messages, and a 24-bit cyclic redundancy check. Interlaken uses Xon/Xoff flow control with either in-band or out-of-band messages. In-band flow control is communicated through a 16-bit field in the control word, supporting from 256 channels (like SPI-4.2) to 64,000 channels.
The packet processor may use Interlaken to interface to the traffic manager and fabric components. Fabrics for 40GbE or 100GbE are available from Broadcom, Fulcrum, and Marvell. The fabric components can use a 10-Gbps serial backplane interconnect as defined by 40 GBase-KR4 or OEMs may use a customized interface. The 40 GBase-KR uses four pairs of 10 Gbps in each direction to support a 40-Gbps backplane.
The other major interface on the line card is between the packet processor or traffic manager to memory. At 100 Gbps, the packet processor or traffic manager will need to buffer packets and therefore needs a bandwidth of 200 Gbps plus the overhead. Most NPU vendors prefer to use commodity memory, which is currently DDR3 at transfer rates of up to 1,600 MT/s. OEMs using ASICs may alternatively use RLDRAM (reduced-latency dynamic random access memory) for greater bandwidth efficiency. Another alternative to improve the bandwidth to (chip) pin-count ratio is to use serial memory. Vendors such as MoSys are developing serial memory interfaces, which promise a two to five times reduction of pins for 100GbE applications. A serial memory interface can provide greater scalability and bandwidth efficiency.
Prospects for the future
Data center and increasing Internet traffic are driving greater bandwidth from the end user through the network. With 10GbE ports rapidly increasing, there is a need to aggregate this traffic over 40GbE and 100GbE. Developing systems for 40GbE and 100GbE requires several different interconnects, which for the most part are in place. These interconnects allow vendors to develop compatible devices and for OEMs to develop standards-compliant systems. Although the early systems may use a large number of components, OEMs and service providers can start deployments and field trials.
We expect the second generation of 100GbE components to enable smaller and lower cost systems. Specifically, we expect the CFP module to be reduced to half of its current size. Additionally, the components on the line card should increase in integration, such as MAC and NPU integration. The chip-to chip interconnects may also take the next step and move to 25 Gbps. All of these advances will help greater adoption of 40GbE and 100GbE as well as reduce system level cost and power.
Jag Bolaria is a senior analyst at The Linley Group. Before joining The Linley Group, Jag was the director of network systems and validation for Intel's Ethernet components. He began his career as an R&D engineer with Standard Telecom Labs (STL) and has a BS in electronics from the University of Salford in the U.K.