Parallel processing has hit mainstream computing in the form of CPUs, GPUs and FPGAs. While explorations proceed with all three platforms individually and with the CPU-GPU pair, little exploration has been performed with the synergy of GPU-FPGA.
This is due in part to the cumbersome nature of communication between the two. This paper presents a mechanism for direct GPU-FPGA communication and characterizes its performance in a full hardware implementation. Using it data moves through the PCIe switch once and is never copied into system memory, thus enabling more efficient communication between these disparate computing elements.
Our next order of business will be to address the glaring bottleneck in the FPGA to GPU transfer direction. This will require a detailed analysis of the slave read data path within the FPGA likely followed by a number of changes to the Verilog code. With those changes, we hope to see the bandwidth rise to be commensurate with the GPU to FPGA direction.
Concurrently, we are exploring applications that could benefit from the close synergy between GPU and FPGA that this technique enables. The GPU offers relative ease of programming and floating point support while the FPGA offers extreme flexibility, bit manipulation and interfacing possibilities.
Other potential investigations include the extension of our approach to non-nVidia GPUs and to GPU-FPGA interactions beyond memory transfers, such as synchronization, that are presently mediated by the CPU.
To read this external content in full, download the complete paper from the authors’ archive at Microsoft Research.http://research.microsoft.com/pubs/172730/20120625%20UCAA2012_Bittner_Ruf_Final.pdf