Down & dirty with HW/SW co-design: Part 1 - Reviewing the fundamentals.
The combination of a CPU plus one or more accelerators is the simplest form of heterogeneous platform; hardware/software partitioning targets such platforms. The CPU is often called the host. The CPU talks to the accelerator through data and control registers in it. These registers allow the CPU to monitor the accelerator's operation and to give it commands.The CPU and accelerator can also communicate via shared memory. If the accelerator needs to operate on a large volume of data, it is usually more efficient to leave the data in memory and to have the accelerator read and write memory directly rather than to have the CPU shuttle data from memory to accelerator registers and back. The CPU and accelerator synchronize their actions.
More general platforms are also possible. We can use several CPUs, in contrast to the single processor attached to the accelerators. We can generalize the system interconnect from a bus to more general structures.
Plus, we can create a more complex memory system that provides different types of access to different parts of the system. Co-designing such types of systems is more difficult, particularly when we do not make assumptions about the structure of the platform.
Three examples below describe several different co-design platforms that use FPGAs in different ways.
Example 7-1. The Xiilnx Virtex-4 FX Platform FPGA Family
The Xilinx Virtex-4 family [Xi105] is a platform FPGA that comes in several different configurations. The higher-end FX family includes one or two PowerPC processors, multiple Ethernet MACS, block RAM, and large arrays of reconfigurable logic.
The PowerPC is a high-performance 32-bit RISC machine with a five-stage pipeline, 32 general-purpose registers, and separate instruction and data caches. The FPGA fabric is built on configurable logic blocks (CLBs) that use lookup tables and a variety of other logic.
The largest Virtex-4 provides up to 200,000 logic cells. The CLBs can be used to implement high-speed adders. A separate set of blocks includes 18 x 18 multipliers, an adder, and a 48-bit accumulator for DSP operations. The chip includes a number of RAM blocks that can be configured to a variety of depth and width configurations.
The PowerPC and FPGA fabric can be tightly integrated. The FPGA fabric can be used to build bus-based devices. An auxiliary processor unit allows custom PowerPC instructions to be implemented in the FPGA fabric.
In addition, processor cores can also be implemented in the FPGA fabric to build heterogeneous multiprocessors. Xilinx provides the MicroBlaze processor core and other cores can be used as well.
Example 7-2. The ARM Integrator Logic Module
The ARM Integrator is a series of evaluation boards for the ARM processor. The Integrator Logic Module [ARM00] is an FPGA accelerator board that plugs into the ARM Integrator motherboard.
The logic module provides a Xilinx FPGA for reconfigurable logic. The FPGA interfaces to the ARM AMBA bus. The logic module board does not contain its own SRAM-the FPGA can use the AMBA bus to connect to the SRAM and I/O devices contained on other boards.
Example 7-3. The Annapolis Micro Systems WILDSTAR II Pro
The WILDSTAR II Pro is a PCI bus card that provides FPGA logic on the bus of a PC or other PCI device. The card hosts one or two Virtex II Pro FPGAs that can connect directly to the PCI bus.
It also hosts up to 96 MB of SRAM and 256 MB of SDRAM. The card is organized as shown in the figure. A development environment simplifies system design for the board.


Loading comments... Write a comment