The PCI-SIG, an industry organization dedicated to developing and enhancing PCI/PCI Express (PCIe) technology, has successfully developed the PCI, PCI-X and PCIe Gen 1 and Gen 2 interconnect protocols and promoted the deployment of these technologies since PCI's inception in 1992.
In early 2008, the PCI-SIG announced the establishment of a workgroup chartered with the development of the next generation of PCIe ” the PCI Express Base Specification 3.0, or PCIe Gen 3.
The Gen 3 specification is yet another step forward in enhancing the usefulness of the PCIe protocol by doubling the effective bandwidth and adding protocol enhancements to increase end-system performance.
Leading up to this development, IBM and Intel in 2006 launched an initiative called Geneseo, proposing extensions to the PCIe protocol for high-performance computing and visual processing.
Recommendations from this initiative were provided to the PCI-SIG as potential PCIe protocol enhancements. In addition to the adoption of Geneseo, several other engineering change notices (ECNs) were released by the PCI-SIG, providing enhancements for the efficiency and usefulness of the PCIe protocol.
This article will shed light on the PCIe Gen 3 standard, as well as some of the key enhancements that will be implemented in PCIe Gen 3 components.
Ten key enhancements have been completed and will be implemented in next-generation PCIe devices and systems. Some of these enhancements may get implemented into PCIe Gen 2 devices, while others will only be supported in Gen 3 products. Let's take a closer look at some of these enhancements (Table 1 ) approved as ECNs to the PCIe specification.
TLP Processing Hints
Caches and now snoop filters are used in processor chip sets to reduce effective memory latency and increase throughput. Snooping of memory requests from PCIe is used to maintain coherence with processor caches. Transaction Layer Packet (TLP) processing hints, or TPHs, provide an additional means to improve I/O performance in a complex memory hierarchy.
The TPH ECN defines the structure by which TPHs are provided to the memory controller by I/O devices. Three bits are defined in the new TLP header to identify the presence of TPH and processing hints. Eight additional bits (optional) are defined as a “steering tag” for system specific information. These hints enable the optimal allocation of the cache hierarchy resulting in lower memory access latencies, interconnect overhead, and power consumption.
In certain usage models or applications, strong ordering of packets going through a system or a set of devices is required. In other cases PCIe ordering rules can be relaxed to provide higher performance. In new usage models, multiple flows or data streams are separated by Requester ID, allowing each to run through the system independently of other flows, where conventional strong ordering or even relaxed ordering may cause some performance bottlenecks.
The ID-based ordering (IDO) ECN defines the mechanism to set and use ID-based ordering along with conventional relaxed ordering (RO) for avoiding blocking or bottlenecks in a system. IDO, in combination with RO, is highly beneficial in multi-function devices and switches, allowing TLP streams from different devices or functions within a device to be delivered faster. By default, IDO would be disabled but drivers or software can enable this function if supported.
Today, atomic transactions are supported for synchronization without using an interrupt mechanism. In emerging applications where math co-processing, visualization and content processing are required, enhanced synchronization would enable higher performance.
The atomic operations (AO) ECN enhances the existing methods of synchronization by adding a few bits to the TLP header that can be interpreted and acted upon by PCIe devices. The ECO defines three operations ” FetchAdd, Swap & Compare, and Swap.
As PCIe expands beyond basic graphics, storage and server platforms and into communications and embedded markets, it requires a mechanism where a single packet can be sent to multiple destinations efficiently. Applications like communications backplanes, mirroring in storage systems, multi-graphics computing and high-resolution imaging can certainly take advantage of the multicast (MC) feature.
The multicast ECN supports address-based multicast through MC base address registers (BARs), MC-capability structures and overlay mechanisms. The MC specification supports only posted address-routed transactions (such as memory write) for both root complex and endpoints as initiators and targets. Up to 64 MC groups can be supported as defined in the ECN.
Dynamic Power Allocation
As devices get faster and more complex, their power goes up, which requires additional measures for the control and management of power. The current PCIe Gen 2 specification (r2.0) provides requirements for device and link level power management (PM) as well as dynamic change in link-speed and link-width.
The dynamic power allocation (DPA) ECN defines 32 sub-states per function for PCIe active device (D0) power management. Additional PM specifications are developed to manage the latency of change in device power states.
PCIe Gen 3 Standard Development
The goal of the PCI-SIG work group defining this next-generation interface was to double the bandwidth of PCIe Gen 2, which is 5 gigatransfers per second (GT/s) signaling but 4GT/s effective bandwidth after 8b/10b encoding overhead.
The group had two choices: either to increase the signaling rate to 10GT/s with 20 percent encoding overhead or select a lower signaling rate (8GT/s) for better signal integrity and reduced encoding overhead with a different set of challenges.
The PCI-SIG decided to go with 8GT/s and reduce the encoding overhead to offer approximately 7.99GT/s of effective bandwidth, approximately double that of PCIe Gen 2.
The increase in signaling rate from PCIe Gen 2's 5GT/s to PCIe Gen 3's 8GT/s provides a sixty percent increase in data rate and the remainder of the effective bandwidth increase comes from replacing the 8b/10b encoding (20 percent inefficiency) with 128b/130b coding (1-2 percent inefficiency).
The challenge of moving from PCIe Gen 2 to Gen 3 is to accommodate the signaling rate where clock timing goes from 200ps to 125ps, jitter tolerance goes from 44ps to 14ps and the total sharable band (for SSC) goes down from 80ps to 35ps.
These are enormous challenges to overcome but the PCI-SIG has already completed board, package, and system modeling to make sure designers are able to develop systems that support these rates. The table below highlights some key aspects of PCIe Gen 2 and Gen 3.
The beauty of the Gen 3 solution is that it will support twice the data rate with equal or lower power consumption than PCIe Gen 2. Additionally, applications using PCIe Gen 2 would be able to migrate seamlessly as the reference clock remains at 100MHz and the channel reach for mobiles (8 inches), clients (14 inches), and volume servers (20 inches) stay the same.
More complex equalizers, such as decision feedback equalization, may be implemented optionally for extended reach needed in a backplane environment. The Gen 3 specification will enhance signaling by adding transmitter de-emphasis, receiver equalization, and optimization of Tx/Rx Phase Lock Loops and Clock Data Recovery.
The Gen 3 specification also requires devices that support Gen 3 rate to dynamically negotiate up or down to/from Gen 1 and Gen 2 data rates based on signal/line conditions.
|Table 1. PCIe Gen 2 vs. Gen 3|
Gen 3 Status and Projections
The PCI-SIG initially planned to complete revision 0.9 of the specification by the fourth quarter of 2009 and finalizing it to revision 1.0 by mid-2010.
Although the PCI-SIG is moving as fast as it can to complete the specification, the challenges that come from understanding, modeling and defining a robust high-speed serial communication standard (Gen 3) are significant and have resulted in delays.
Currently, revision 0.7 is under review and expected to be released by June 2009, which will probably push the r0.9 release to between late-2009 and r1.0 to the first half of 2010.
It is important to note that the key developers of the PCIe Gen 3 components (CPUs, chipsets, switches, GPUs, and I/O devices) will test their silicon based on the r0.7 release.
They will provide test results and feedback for potential changes to the specification before r1.0 can be released in 2010. , Leading chip vendors, such as PLX Technology for PCIe switches along with CPU and GPU providers, are developing early Gen 3 silicon in order to allow the specification to be fully validated before it is finalized.
Similar to the adoption of PCIe Gen 2, consumer graphics card vendors are expected to adopt Gen 3 as soon as shippable silicon becomes available. Next, enterprise systems vendors will start supplying servers and storage products based on Gen3.
Communications and embedded systems vendors are designing with the PCIe Gen 2 now and are expected to move to Gen 3 after embedded CPU, ASIC and FPGA vendors deliver Gen 3 PCIe in their respective products.
Akber Kazmi is product marketing director for PCI Express switches at PLX Technology. He can be reached at firstname.lastname@example.org.