This paper describes a bus mastering implementation of the PCI Express protocol using a Xilinx FPGA. While the theoretical peak performance of PCI Express is quite high, attaining that performance is a complex endeavor on top of an already complex protocol.
While major FPGA companies offer PCI Express implementations, the cores stop short of providing the Transaction Layer and leave that as an exercise to the user. This is not such a bad thing, since the Transaction Layer really defines the type of device that is being implemented and how it will behave; however, its implementation is not trivial.
While it is not overly difficult to develop a programmed I/O Transaction Layer interface, such an implementation will not even come close to providing the full bandwidth that is available from PCI Express. In order to achieve higher bandwidth, a bus mastering interface is required, and the implementation of that interface is much more complex.
This paper describes a real world bus mastering implementation and provides the associated Verilog source code with a Microsoft Windows WDM Driver and testing application for general use. In addition, the design is analyzed with an eye towards improvements.
The net effect of the suggested changes could lead to measured improvements in read and write transactions that more closely approximate hardware peak performance numbers and possibly approach the theoretical limit of 200 MByte/Sec for an x1 link.
To read this external content, download the full paper from the Microsoft Research on line archives.