Making the shift to optical interconnect with PCIe Gen3
The highly anticipated launch 8 Gbps, PCI Express 3.0 (PCIe Gen3) is making its debut, doubling the effective bandwidth and generating the same ubiquity and economies of scale as previous generations of PCIe.
With this introduction, PCIe takes its place among the high-performance protocol superpowers – efficiently maximizing gigabits per dollar, port usage and power consumption. But how many gigabits per second is enough? Can copper sustain another bit-rate doubling or have we reached the end of copper links as we know them?
This article will cover the benefits of PCIe Gen3 as an optical interconnect. Along the way, we’ll look at the copper dilemma, optical fibers and pertinent advances in optical technology, such as LightPeak, and its cost/power tradeoffs, and where designers need to focus their attention.
The challenges of sending high-symbol rate data across copper channels are well understood. Frequency dependent conductor and dissipation losses, link and circuit discontinuities, material variation,- these are some of the key factors to diminish the ability to decode data effectively.
When bandwidth limited, signals undergo non-uniform alteration based on frequency, resulting in distortion. Using the concept of pulse response, both distortion and how equalization counteracts distortion, can be seen in Figure 1 below.
Figure 1: Pulse response (To view larger image, click here)
Transmitter and receiver equalization operate on the basic principle of creating pre- and post-cursor distortion such that an inverse channel match (at the time of data sampling) is created. The smaller the used portion of the channel, the easier it is to create the inverse. This is because typically the band of operation can be confined to some linear region.
As the data bandwidth stretches to occupy more channel bandwidth, transmitter and receiver complexity grows as more of the channel distortion must be replicated and inverted.
As shown in Figure 2 below, PCIe Gen3, operating on such a channel, consumes a large percentage of the usable channel spectra and the losses are not linear.
Figure 2: Channel loss
When band-limited, as in the case of copper, protocols can take different options in an attempt to maximize valuable channel space:
Option #1: Overhead reduction – reducing the number of bits used for data framing and signaling. Several protocols adopt alternate framing codes in an effort to increase speed. Framing overhead can account for as much as 25 percent of channel bandwidth.
Variations of Ethernet, FibreChannel and PCIe, for example, have gone from 8b/10b encoding to alternate, lower overhead, 2-bit formats (such as 64b/66b 10GigE/16GFC, 128/130b PCIeGen3). These conserve channel bandwidth but increase circuit complexity.
Option #2: Higher order modulation formatting – reducing bandwidth by encoding more bits per symbol - trading channel SNR (channel capacity) against available bandwidth. Pulse Amplitude Modulation (PAM) is one such example used by Ethernet to map data bits into four or more discrete levels. Other symbol-creation techniques, such as coherent orthogonal frequency-division multiplexing (COFDM) or quadrature amplitude modulation (QAM), also increase the bits per symbol and have been staples of digital RF video transmission for many years. As an example, a 16-QAM symbol (Figure 3 below) represents four bits of useful information, effectively dropping the used channel spectrum by one-quarter. Cable modems often employ 64-,128- or 256-QAM modulations.
Figure 3: 16-QAM constellation
Option #3: Channel Bonding – XAUI, an Ethernet specification, uses four 2.5Gbps links to create an aggregated 10Gbps link. 10Gbase-T – through advanced equalization, bi-directional signaling, echo cancellation, symbol encoding and four physical lanes -- performs both channel bonding and higher-order modulation to transform four low-bandwidth, twisted-pair channels into a bonded link capable of 10Gbps.
The PCIe specification, while not as spectrally efficient as 10Gbase-T (nor as power-intensive), provisions for as many as sixteen 8Gbps links to be bonded as a single channel. In Gen3 terms, this is as much as 128Gbps in each direction.
Dealing with optical channels
Light passing through two differing optical media will bend in accordance with the incident angle and difference in refraction index between the two media. Ever recall reaching for an object in a clear pond? Water, like glass, has a higher index of refraction than air.
Likewise, light launched down the axis of a strand of glass, gets reflected along the air-glass interface, resulting in an optical waveguide (total internal reflection). (In an actual optical fiber, it is not air-glass, but rather a higher-index glass and lower-index cladding material).
Where many of today’s high-speed copper channels measure Gbit/mW success in inches, an optical link can measure success from hundreds of meter to kilometers.
Many of the longest-haul systems use single-mode fiber, or SMF (described below), and 1550nm lasers for maximum reach. However for most applications of interest -- and those relevant to this article,channel distances are likely to be limited to 30 meters -100 meters at most. These distances should suffice most server, high-performance computing (HPC) and potential consumer applications envisioned. Table 1, below lists typical fibers and their losses.
Table 1: Fiber losses and bandwidth – Source Fiber Optic Association (To view larger image click here)
Basic Optical Losses and Dispersions
With optical fiber, channel loss is not dependent upon bit rate but on the spectral purity of the source and its interaction with the channel. These losses are the result of energy absorption of light by the glass.
In Figure 4 below, the plot makes apparent why optical sources are selected in the 850nm, 1330nm or 1550nm regions due to absorption minima. This plot also shows the advantages of 1550nm operation (typically used for long-distance applications), which is explained in the next section.
Figure 4: Absorption and dispersion
As mentioned above, one means of distortion comes from the spectral variance of the source and its interaction with the channel. (A typical 10Gbps vertical-cavity surface-emitting laser, or VCSEL, will have a center frequency from 830nm to 860nm and a spectral width of 0.65nm.)
Despite the very narrow spectrum, the variance is not zero. Both fiber material and dimensional construction result in wavelength- and polarization-dependent transmission characteristics. Because the response is not uniform, the instantaneous variations in wavelength result in phase distortion (group delay) and subsequent pulse spreading.
From the sample plot, we can see that optical operation in the 850nm region produces higher dispersion than at 1300nm or 1550nm. Consequently, most long-haul optical systems operate at 1550nm near neutral dispersion.
While the wavelength for neutral dispersion will vary with cable, a tight spectral variance and operation at regions near this point result in a near dispersion-free link. With this in mind, we discuss single-mode fiber and laser operation, below.
Technically, light is defined as one or more transverse electric and magnetic (TEM) fields. When an optical pulse is launched, many modes are incident on the fiber. Most propagate in the glass, but some can even exist in the cladding.
By virtue of ray tracing, the fundamental principles of light propagation and modal link distortion can be explained.
The fundamental mode (TEM00) is represented by the light ray travelling directly down the fiber center. Higher modes are represented by rays traveling at angles down the fiber (Figure 5 below). As can be seen from the picture, higher order modes will not travel straight down the fiber, but in effect, bounce from side to side as they traverse the fiber length, resulting in pulse spreading.
Figure 5: Ray tracing simplification step index fiber
From ray analysis, this is the simple result of higher modes having to travel a farther distance than does the fundamental mode. One means to correct for this distortion is by lowering the index of refraction of the fiber as a function of the radial distance out from the core center, called graded index fiber.
In this manner, signal energy from higher order modes, traversing across the outer edges of the fiber propagate faster as they move away from the fiber core. An alternative solution would be to inhibit these higher-order modes from even entering the fiber.
Single-mode fiber thins the diameter of the core such that it supports only the center ray. (Where MMF has an inner core/outer core dimension of either 62.5/125 um or 50/125um, a SMF has a cable dimension of 9/125um). SMF, at a cost of lower launch power and tighter manufacturing tolerances, removes the modal effects and leaves only a small component of chromatic dispersion, greatly increasing channel bandwidth.
Receiver Effects and Equalization
An avalanche photo-detector (APD) is typically used to convert light back to an electrical pulse. While APDs can have very large bandwidths, they are nevertheless band-limited, and consequently the conversion results in rise-time degradation and deterministic jitter (Figure 6 below).
Figure 6: Rise-time degradation and deterministic jitter
As bit periods reduce, this jitter becomes a larger part of the overall timing budget. Additionally, APDs have a signal-sensitivity limit (e.g., a receiver intrinsic noise, or RIN floor) that is governed by material, design, circuit bias and the transmitted source-extinction ratio. These set the signal floor and negatively impact the system jitter budget.
At 8 Gbps, the changes in distortion between 1 meter and 30 meters are negligible. Interestingly, some of the optical channel and component distortions can result in similar equalization techniques as copper.
Figure 7: PCI Express Gen3 link passing through an optical channel (To view larger image click here)
Besides optical methods of distortion control such as active fiber doping for gain, optical FIR compensation, there are electronic dispersion equalization techniques such as signal limiting (or AGC), bit retiming and decision-feedback equalization (DFE).
These are similar techniques used to battle signal loss, excessive random jitter, group delay/ pulse spreading, deterministic jitter in copper links. With these circuits come well-known design techniques and optimizations that led to efficient silicon design.
So, where does this lead designers?
Whether multimode fiber, where distortion and distance can still be an issue, or SMF, where limits in laser and electrical modulation rates necessitate more data bits per symbol, higher-order modulation schemes are coming to the forefront for advanced optics. In some cases, these methods are a return to older RF modulation concepts.
Wave Division Multiplexing. This in effect is analogous to creating additional channels of operation. Advances in laser technology allow for fine control of the transmitter wavelength. By controlling the wavelength of each source, multiple optical carriers, spatially separated by wavelength, can be combined onto the same optical line.
In these applications, anywhere from 40 to 100 channels can be combined onto a single fiber. (Using a single SMF, systems have demonstrated over 1 Tbit/sec throughput at a distance of 10km using DWDM)
Through the use of highly selective “prisms” (technically, interferometers are used), wavelength selectivity is maintained before feeding the optical-to-electrical converter for electrical signal regeneration.
QAM Format. Quadrature Amplitude Modulation (QAM), as discussed above, is a long-standing RF techniques used to improve channel efficiency by increasing the number of bits per symbol, rather than the number of symbols per second.
As the name implies, the technique optically splits a carrier signal into two components 90 degrees phase shifted from the other. These two components represent the classical I and Q vectors in QAM transmission.
By linearly varying the amplitude of each vector and optically summing them back together, the typical QAM constellation diagram can be mapped. With this method both QAM64 and QAM128 have been demonstrated.
COFDM Format. COFDM is a digitally intense signaling method by which data is encoded onto a large number of carriers and then passed thru an Inverse Discrete Fourier transform. The composite signal is then fed to a Digital to Analog Converter for transmission.
At the receiving end, a DFT is performed to reverses the transform process and the carriers are demodulated. This method of transmission has proven very robust to non-uniform channel response and distortion – something of important in long haul optical applications and switching.
In the optical space, one technique combining digital Fourier methods and optical WDM and QAM modulation allow a similar implementation of multiple carriers thru optical fiber. Methods today have implemented as many as 32 carriers (optical wavelengths).
In comparison, the DVB-T standard implements either ‘2k’ or ‘8k’ carrier symbols. However, because fiber can support much higher bandwidths, some experimental systems are purported to have produced effective single-fiber data rates as high as 100Gbps and reaching lengths beyond 20km – more distance than the protocol can accommodate, but most certainly a “fat-pipe.”
Plain Old-fashioned Faster Modulation. VCSELs today are typically limited to modulation rates of ~ 10-12Gbps. Short on the horizon, are VCSELs capable of as much as 25Gbps. Assuming an optical launch rise time of ~ 10ps , at 25Gbps modal dispersion now becomes more significant (Figure 8 below). One means of mitigation would be an industry shift to SMF – using the same type fiber for short and long haul systems
As done in telecom, Long Haul applications typically use long wavelength lasers (better fiber dispersion) and external modulators to pass or block light flow from a continuous laser beam.
While external modulators are more expensive, offer higher modulation rates as they are not affected by carrier charge removal and ringing effects (relaxation oscillation and chirping), as compared to direct modulation. The downside to such a change would include cost and the inability of SMF systems to interoperate with a much larger install base of legacy multi-mode fiber.
Figure 8: Estimate fiber effects at 25Gbps
While much of the above modulation methods are yet to reach mainstream usability at the price points PCIe users demand, their R&D drives the performance expectations and cost targets of systems today.
So Why Optical? Why Now?
While the speed of optical links has been known and utilized for decades, what brings optics to the forefront today are size, power, reliability, and cost reduction. Where optical connectivity was once the domain of larger dedicated modules, today several things are changing. Let’s examine some of these factors:
VCSEL. Early days of high-speed data communications mostly used directly modulated 850nm DFB lasers for short distance. These sources ranged from $30 to $50 per diode.
VCSEL technology opened a new period in laser optimization: mass production and test-capable devices at a fraction of DFB size, an eighth of the cost, one fourth to one eighth the drive current, three to ten times faster in modulation, and five to ten times higher in MTBF reliability. (In fact, it was the reliability factor that first led to rapid deployment of VCSEL technology in the storage market.)
With reduction in bias power (~35ma vs. ~5ma) comes an ancillary reduction in the transistor power needed for modulation – an added benefit for circuit integration. Which leads to next discussion – optical engines.
Optical Engines. The first task of an optical assembly is to get light on and off the fiber. This is the function of the transmit optical sub-assembly (TOSA) and receive optical sub-assembly (ROSA). These assemblies have been a historically laborand cost-intensive process.
TOSA and ROSA assemblies typically consist of several components (laser, photo-detector, outer housing, inner sleeve, ball lense assembly, and epoxy), each requiring precision design and individual alignment.
Optical engines have moved the industry forward by size reduction and integration of these complex components into something simpler. An optical engine minimally combines laser, photo-detector, alignment fibers, mating connector, LD driver, PD limiter and/or equalization receiver all into one assembly.
Designs can be either simplex or duplex -- simplex meaning that one side of the link has all transmitter functions (TOSA and driver), the opposite has all receiver functions (ROSA and receive electronics).
Key to the technology has been the means by which multiple manufacturers can now quickly and reliably align optical assemblies to fiber. Most engines host two or more channels (four fibers).
Some align and integrate as many as 12 channels (12 TX to 12 RX). Along with reduction in size comes improvements in power. Whereas a FibreChannel, 1.0625Gbps (single channel) GBIC of yesterday was roughly 1.5 watts; today, a comparable 10Gbps optical engine is ~ 150mW to 200mW – a 10X power reduction.
Where GBIC costs ran roughly $200/unit the long-term price for optical engines is under $2 per Gigabit, per link or $20 to $40 for a bi-directional 10Gbps channel. (Figure 9 below).
Figure 9. Reflex Optics, left, andGBIC, right.
Optical Cable and Connector. While sand is cheap, connector-ized optical fiber is not – at least not yet. (A duplex, 30-meter LC patch chord can cost as much as $150.) To meet the storage and communications challenge, these costs must come down. Potential options for a low fiber count include the optical Mechanical Transfer Registered Jack (MTRJ) connector.
Multipe-fibre push-on (MPO) connectors from companies such as Molex, produce parallel optics with as many as 24 fibers per connector. Still, a good quality, 24-fiber, 30-meter MPO cable could run as high as $200 or more.
To gain a perspective, an eight-channel, two-meter PCIe Gen2 copper cable, costs approximately $250. Comparing effective cost per bit: Removing 8b/10b overhead, this produces an effective throughput of (4Gbps x 8 channels x 2 directions) or 64Gbps.
An equivalent x4 optical duplex solution (8 optical fibers, producing 8x4x2=64Gbps) has an engine cost of $120 to $160 (in large volume) and a 30-meter, 12-fiber MPO cable cost of ~$75. In order to escalate the rate of optical adoption, these costs must continue to fall.
Light-Peak is a newly touted optical initiative for which Intel is leading the charge to reduce the cost of optical connectivity. Intel has engaged with a consortium of suppliers in photonics, SerDes and optical subassemblies to drive costs down. Estimated prices have ranged from $10 to $20 for a complete connectivity solution.
The basic Light-Peak system is anticipated to consist of an optical engine (providing four fibers – two duplex channels) and an Intel-based routing engine (much like an IOH).
This router will take data for the optical fat pipe and break it into the requisite protocol, i.e., SATA, USB or PCIe. Still, the key will be the creation of a cost-effective, reliable optical connector. To date, the functionality and price points appear yet to be realized.
Several other organizations are also working as a consortium to reduce optical interconnectivity. Research is ongoing in ways to improve reliability and cost structure of ribbon optical connectivity.
As an example MPO connectors are gaining acceptance as standard parallel optical solution. Engines and ribbons capable of supporting 12 duplex channels (24 fibers – 12 RX and 12 TX) are on their way. While targeted high-volume price points below $10 per 10 Gigabit bi-directional port can spark interest in server and HPC applications, for consumer uses, these numbers must continue to fall.
So how does PCIe stand to benefit from this optical landscape?
The answer lies in the light. Optics are protocol-agnostic and PCIe has enjoyed widespread adoption across nearly every market. PCIe can support the optical needs of HPC/server markets today, and projected consumer markets tomorrow.
The applications and volumes of PCIe give optical manufacturers a design migration path to follow. Interestingly, the necessity a new optical router chip, as proposed by Intel, becomes a question mark.
Additional benefits of PCIe include an open standard, wide adoption, economies of scale, and inherent backward compatibility with legacy devices and existing software stacks. Today, PCIe Gen3 can operate as a peer-to-peer “fat-pipe” or rate-conversion engine for lower-speed devices.
As a connectivity example, PLX recently demonstrated the first PCIe Gen3 switch as an optical “fat pipe,” providing remote Internet communication (Ethernet) and solid-state storage (SSD) storage across a single fiber pair (Figure 10 below).
Figure 10: PCIe Gen3 managing Ethernet, Internet, SSD over optical
Such demonstrations are basic examples of PCIe speed, flexibility and connectivity -- Internet, video, graphics processing, and storage. The above-referenced demonstration required no changes to existing endpoint hardware or software.
With 10Gige-T Ethernet on the rise, we can certainly see a bandwidth consumption model for a x4 or x8 PCIe optical pipe feeding both high performance storage and data server applications.
As Gen3 devices unfold in 2011 and protocols emerge that can leverage this powerful interconnect technology, a new breadth of PCIe adoption will take place.
One such example is the Non-Volatile Memory Host Controller Interface (NVMHCI) specification, which will find its way into PCIe-based SSD drives. Similarly, to highlight the popularity of an expanding PCIe market, a wireless version of the technology, wPCIE, has emerged.
In effect, a PCIe switch is the de-facto optical router for applications now. From where we stand today, it is within short view to envision PCIe Gen3 as the precursor to an optimized short-reach/optical standard: PCIe Express Gen4. Expect a future in which many of the HBA functions existing outside the switch today are incorporated into specification extensions to reduce power and latency.
And, if we ever need to transmit the entire Library of Congress in less than three seconds, PCIe can take 3 familiar paths – faster modulation (lockstep with laser modulation), lane bond across more fibers or, as has been done with other technologies, move to advanced modulation.
In short, PCIe Gen3 brings enough raw bandwidth yet economies of markets to move optical connectivity along a trajectory for wider adoption. Today, optics are reaching price points where server and HPC environments can act as a proving ground for wider usage of tomorrow. The breadth of PCIe, seamless legacy integration, and openness of standard provides a clear path to the future.
Reginald Conley is director of hardware applications at PLX Technology, Sunnyvale, Calif. (www.plxtech.com), He holds an MSEE and MBA from San Jose State University. Prior to PLX, he held the position of R&D design manager for an optical-transceiver company. He can be reached at email@example.com.
1) Optical Fiber Communications – Gerd Keiser, McGraw Hill,1983
2) Fiber Optic Association – WWW.theFOA.org
3) Optical OFDM – a Hype or is it for real? Sanders, etc. ECOC 2008, 21-25 Sept 2008
4) Orthogonal Frequency Division Multiplexing for Adaptive Dispersion Compensation in Long Haul WDM Systems – Arthur James Lowery, etc. 2006 Optical Society of America.