Using PCI Express as a fabric for interconnect clustering

Miguel Rodriguez

March 7, 2011

Miguel Rodriguez

A Lossless Fabric, High Throughput

Unlike 10GbE, PCIe is a lossless fabric at the transport layer.  The PCIe specification has defined a robust flow-control mechanism, which prevents packets from being dropped.  Every PCIe packet is acknowledged at every hop, insuring a successful transmission.  In the case of a transmission error, the packet is replayed again. 

This happens in hardware without any involvement of upper layers.  In contrast, 10GbE has an intrinsic tendency to drop packets in the wake of congestion and relies on upper-layer protocols, such as TCP/IP, as a means to re-transmit the dropped packets.  PCIe provides a more reliable communication over Ethernet, notwithstanding the inherent overheads associated with it.  Data loss and corruption in storage systems is simply not an option.

Providing low latency and high throughput at the hardware level is a good foundation for a high-performance system.  However, equally important is the interconnect’s capability of providing the applications with an efficient interface to maximize use of the underlying hardware. 

PCIe has an extremely low end-to-end latency (<1µs).  The new PCIe 3.0 standard also supports higher throughput of 8Gbps per lane.  This results in an aggregate bandwidth of 128Gbps on a 16-lane (x16) PCIe interface.  Furthermore, dedicated DMA controllers inside PCIe switches, such as those from PLX, provide an efficient, high-performance data mover that can be programmed to push/pull large amounts of data without involving the CPU.

Ethernet offers current speeds of 1 GbE, 10GbE and 40GbE and 100GbE in the roadmap.  However, throughput isn’t considered the sole performance metric designers take into consideration.  Ethernet fails in two important areas – latency and jitter. 

Its inherently unreliable nature which allows it to drop packets in the wake of congestion results in higher unpredictable latencies.  Although protocol enhancements, in the form of Converged Enhanced Ethernet (Figure 2 below), are in the works, it’s still unclear whether or not the improved latencies can rival the low latencies currently offered by both PCIe and IB.

 

Figure 2. Supporting IPC, LAN and SAN with Converged Enhanced Ethernet (CEE)

IB can support up to 14Gbps with IB-FDR and 26Gbps with IB-EDR of throughput per lane, while also offering low latencies, so it’s commonly deployed in high-performance computing (HPC).  However, when the need for LAN or SAN connectivity to an IB fabric is taken into consideration, disadvantages arise. 

For LAN connectivity, servers must use the TCP/IP-over-IB protocol (IPoIB) and must go through an IPoIB gateway – one that serves as a bridge between IB and the LAN.  Deployments of such components have been minimal at best.

From the perspective of the two end points involved in the communication channel, IB and Ethernet adapters serve as bridges to PCIe.  Moreover, the communication from these adapters to the server CPU/memory subsystem is through PCIe.  Thus, PCIe holds the key to the bandwidth performance of both Ethernet and IB. 

So, rather than terminating PCIe inside the system and using a different protocol (IB or Ethernet) for communication, it’s advantageous to extend PCIe outside the system so as to realize its full latency and bandwidth potential, benefiting from direct read/write of remote memory -- a benefit which can only be achieved in PCIe. 

< Previous
Page 2 of 4
Next >

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER