One of the enduring challenges of FPGAs with embedded CPUs has been the connection between the processor and the programmable fabric. A conversation at Hot Chips earlier this month with Xilinx vice president Vidya Rajagopalan suggested that Xilinx’s forthcoming entry in this rather exclusive derby, the Zynq 7000, is to be no exception.
There have been two dominant approaches to the interconnect problem, both based on how the vendor perceived the product. If the vendor saw the programmable fabric as a blank slate onto which the customers could write whatever they pleased, the interface tended to be wide and general-purpose, offering enormous potential bandwidth at the cost of considerable complexity. An example might be Altera’s Excalibur product. Alternatively, if the vendor saw the fabric as simply a place to assemble controllers or bus bridges building-block fashion, the interface tended to be a series of standard—for instance AMBA or Wishbone—interfaces ending at stubs in the programmable routing. An example might be the QuickLogic QuickMIPS.
Ostensibly, Zynq is something different: a blank slate on which users write software, some of which will execute on the chip’s two hard ARM Cortex A-9 cores, and some of which will be implemented on accelerators in the chip’s logic fabric. This approach must have presented a bit of a puzzle to the Xilinx architects. Do you create a fully general interface between the A9 cluster and the fabric? If so, at what level—the A9’s coprocessor port? The L2 controller? The coherency engine? Or do you implement a more restrictive but familiar interface, and if so, which one?
Part of the complexity of the problem lies in the level of abstraction Xilinx has in mind. The main purpose of the fabric—aside from the odd protocol controller—is to implement accelerators inferred from the software. But the software contains little information from which anyone could infer an interface structure to go between the A9 cluster and the accelerator.
Presumably in an attempt to bound the problem and render it familiar to SoC architects, Xilinx chose to implement the interface using a standard AMBA 3 bus structure. The exact structure is described in various ways in different places, but it appears that there are two switch matrices: one AMBA-3 for general peripherals and one AXI-3 grouped around the DRAM controller. The peripheral-side switch appears to have five ports for the CPU cluster, a port for each of the eight hard I/O controller blocks, one for a hard static memory controller, and four that end as stubs in the programmable fabric. The second switch has two CPU ports, two ports for the hard DRAM controller, and five ports ending in the fabric.
One of these five ports supports the Accelerator Coherence Port (ACP.) This port provides additional protocol allowing an accelerator to snoop the processor cluster’s L1 and L2 caches—but not, apparently, the cluster’s On-Chip Memory. The intent is for a CPU task to leave a control and data block in cache, and for an accelerator in the programmable fabric to read the block directly from cache, avoiding a write-back to DRAM. The protocol is not symmetric, though: CPU reads and writes do not snoop memory in the fabric. So accelerators are not fully coherent with the system memory.
With this variety of ports, including general-purpose AMBA I/O ports, AXI ports with access to the DRAM controller, and one dedicated ACP, Xilinx has covered quite a range of possible structures in the programmable fabric. The company plans to provide AMBA interface IP, presumably including an ACP client, to ease implementation of connections to the AMBA structure.
That still leaves a gap between the vision of a C-driven virtual processing platform and the reality of architecting a complex AXI-based set of accelerators. Rajagopalan said that Xilinx’s AutoESL tool could infer some interfaces and buffers. But much of the task of isolating hot code segments, architecting accelerators, and fitting the accelerator’s control and data flows into Zynq’s AMBA/AXI structure will require skills in system architecture and a solid understanding of AXI 3. It’s not all about the software yet.