PCI Express Gen 3 Simplified
The PCI-SIG, an industry organization dedicated to developing and enhancing PCI/PCI Express (PCIe) technology, has successfully developed the PCI, PCI-X and PCIe Gen 1 and Gen 2 interconnect protocols and promoted the deployment of these technologies since PCI's inception in 1992.
In early 2008, the PCI-SIG announced the establishment of a workgroup chartered with the development of the next generation of PCIe " the PCI Express Base Specification 3.0, or PCIe Gen 3.
The Gen 3 specification is yet another step forward in enhancing the usefulness of the PCIe protocol by doubling the effective bandwidth and adding protocol enhancements to increase end-system performance.
Leading up to this development, IBM and Intel in 2006 launched an initiative called Geneseo, proposing extensions to the PCIe protocol for high-performance computing and visual processing.
Recommendations from this initiative were provided to the PCI-SIG as potential PCIe protocol enhancements. In addition to the adoption of Geneseo, several other engineering change notices (ECNs) were released by the PCI-SIG, providing enhancements for the efficiency and usefulness of the PCIe protocol.
This article will shed light on the PCIe Gen 3 standard, as well as some of the key enhancements that will be implemented in PCIe Gen 3 components.
Ten key enhancements have been completed and will be implemented in next-generation PCIe devices and systems. Some of these enhancements may get implemented into PCIe Gen 2 devices, while others will only be supported in Gen 3 products. Let's take a closer look at some of these enhancements (Table 1) approved as ECNs to the PCIe specification.
TLP Processing Hints
Caches and now snoop filters are used in processor chip sets to reduce effective memory latency and increase throughput. Snooping of memory requests from PCIe is used to maintain coherence with processor caches. Transaction Layer Packet (TLP) processing hints, or TPHs, provide an additional means to improve I/O performance in a complex memory hierarchy.
The TPH ECN defines the structure by which TPHs are provided to the memory controller by I/O devices. Three bits are defined in the new TLP header to identify the presence of TPH and processing hints. Eight additional bits (optional) are defined as a "steering tag" for system specific information. These hints enable the optimal allocation of the cache hierarchy resulting in lower memory access latencies, interconnect overhead, and power consumption.
In certain usage models or applications, strong ordering of packets going through a system or a set of devices is required. In other cases PCIe ordering rules can be relaxed to provide higher performance. In new usage models, multiple flows or data streams are separated by Requester ID, allowing each to run through the system independently of other flows, where conventional strong ordering or even relaxed ordering may cause some performance bottlenecks.
The ID-based ordering (IDO) ECN defines the mechanism to set and use ID-based ordering along with conventional relaxed ordering (RO) for avoiding blocking or bottlenecks in a system. IDO, in combination with RO, is highly beneficial in multi-function devices and switches, allowing TLP streams from different devices or functions within a device to be delivered faster. By default, IDO would be disabled but drivers or software can enable this function if supported.
Today, atomic transactions are supported for synchronization without using an interrupt mechanism. In emerging applications where math co-processing, visualization and content processing are required, enhanced synchronization would enable higher performance.
The atomic operations (AO) ECN enhances the existing methods of synchronization by adding a few bits to the TLP header that can be interpreted and acted upon by PCIe devices. The ECO defines three operations " FetchAdd, Swap & Compare, and Swap.
As PCIe expands beyond basic graphics, storage and server platforms and into communications and embedded markets, it requires a mechanism where a single packet can be sent to multiple destinations efficiently. Applications like communications backplanes, mirroring in storage systems, multi-graphics computing and high-resolution imaging can certainly take advantage of the multicast (MC) feature.
The multicast ECN supports address-based multicast through MC base address registers (BARs), MC-capability structures and overlay mechanisms. The MC specification supports only posted address-routed transactions (such as memory write) for both root complex and endpoints as initiators and targets. Up to 64 MC groups can be supported as defined in the ECN.
Dynamic Power Allocation
As devices get faster and more complex, their power goes up, which requires additional measures for the control and management of power. The current PCIe Gen 2 specification (r2.0) provides requirements for device and link level power management (PM) as well as dynamic change in link-speed and link-width.
The dynamic power allocation (DPA) ECN defines 32 sub-states per function for PCIe active device (D0) power management. Additional PM specifications are developed to manage the latency of change in device power states.