Model-based FPGA design tool quietly gains adherents
Editor's Note: In this series of design articles, the authors offer a close look at various design challenges and effective use of design tools and techniques for resolution of those challenges. Be sure to check out the second article in this series: Building 491MHz FPGA-based wireless radio heads
FPGAs keep getting larger, the designs more complex, and the need for high level design (HLD) flows never seems to go away. C-based design for FPGAs has been promoted for over two decades and several such tools are currently on the market. Model-based design has also been around for a long time from multiple vendors. OpenCL for FPGAs has been getting lots of press in the last couple of years. Yet, despite all of this, 90+% of FPGA designs continue to be built using traditional Verilog or VHDL.
No one can deny the need for HLD. New FPGAs contain over 1 million logic elements, with thousands of hardened DSP and memory blocks. Some vendor's devices can even support floating-point as efficiently as fixed-point arithmetic. Data convertor and interface protocols routinely run at multiple GSPS (giga samples per second), requiring highly parallel or vectorized processing. Timing closure, simulation, and verification become ever-more time-consuming as design sizes grow. But HLD adoption still lags, and FPGAs are primarily programmed by hardware-centric engineers using traditional hardware description languages (HDLs).
The primary reason for this is quality of results (QoR). All high-level design tools have two key challenges to overcome. One is to translate the designer's intent into implementation when the design is described in a high-level format. This is especially difficult when software programming languages are used (C++, MATLAB, or others), which are inherently serial in nature. It is then up to the compiler to decide by how much and where to parallelize the hardware implementation. This can be aided by adding special intrinsics into the design language, but this defeats the purpose. OpenCL addresses this by having the programmer describe serial dependencies in the datapath, which is why OpenCL is often used for programming GPUs. It is then up to the OpenCL compiler to decide how to balance parallelism against throughput in the implementation. However, OpenCL programming is not exactly a common skillset in the industry.
The second key challenge is optimization. Most FPGA hardware designers take great pride in their ability to optimize their code to achieve the maximum performance in a given FPGA, in terms of design Fmax, or the achievable frequency of the system clock data rate. This requires closing timing across the entire design, which means setup and hold times have to be met for every circuit in the programmable logic and every routing path in the design. The FPGA vendors provide automated synthesis, fitting, and routing tools, but the achievable results are heavily dependent upon the quality of the Verilog and/or VHDL source code. This requires both experience and design iteration. The timing closure process is tedious and sometime compared to "Whack-a-Mole," meaning that when a timing problem is fixed in one location of the design, a different problem often surfaces at another location.
An oft-quoted metric for a high-level design tool is to achieve results that are no more than 10% degraded from a high-quality hand-coded design, both in terms of Fmax and the utilization of FPGA resources, typically measured in "LEs" (logic elements) or "LCs" (logic cells). In practice, very few tools can reliably deliver such results, and there is considerable skepticism among the FPGA design community when such a tool is promoted by EDA or FPGA vendors.
Having said this, there is a design tool that is being quietly adopted by FPGA engineers precisely because it not only addresses this QoR gap, but -- in most cases -- extends it in the other direction, meaning that the tool produces results that are usually better than their hand-coded counterparts.
This tool is called DSP Builder Advanced Blockset (the marketing folks were obviously not at their best when naming this tool). This is a model-based design tool, meaning that design entry is accomplished using models in the Mathworks' Simulink environment. The tool was first introduced to the market in 2007.
There are other model-based tools on the market, such as HDL Coder, Synplify, and System Generator; however, only DSP Builder Advanced Blockset offers the combination of the following ten features:
- Decoupling of system data rates from FPGA clock rates; native multi-channel capabilities.
- Automated timing closure at high Fmax, including auto-pipelining.
- Deterministic latency and data throughput.
- Optimal usage of FPGA hard block features.
- Design portability across FPGA families.
- Fixed- or floating-point numerical implementation.
- Support for vector manipulation.
- Math.h library.
- System simulation in the Mathworks' environment.
- Hardware simulation from the Mathworks' environment.
This combination is what allows the tool to deliver superior QoR along with the productivity advantages of a high level simulation, design, and verification tool flow. Let's look at each of these features in a little more detail...
Decoupling of system data rates from FPGA clock rates
Using DSP Builder, the user specifies the desired design clock rate. The data rate can be higher or lower than the clock rate, sometimes dramatically so. The tool will automatically parallelize the data and represent the data buses as vectors in cases where the data rate is higher than the clock rate. Integer ratios work most efficiently (4, 8, 12, 16, 32...) but any ratio will work and the control path will insert empty data into some of the vectors to accommodate this.
This capability provides the ability to support very high data rates of many GSPS using realistic FPGA clock rates of several hundred MHz, depending upon the FPGA family.
Within DSP Builder, the designer builds the datapath, often containing various rate FIR filters, memory blocks, NCOs, mixers, saturate and round blocks, and so forth. However, the designer need only lay down a single channel datapath assuming the design clocks at the required rate, regardless of the actual data rate. DSP Builder will build the data path with the specified number of channels, and vectorize (or parallelize) the design to achieve the needed data throughput. This is specified in a parameter file, which means it is easily changed, with the only effort being a recompile. The tool generates all needed control logic to handle multi-channel and higher data rates, even for complex datapaths. Further all configuration or coefficient registers can be read or written, with the addressing and accessing logic auto-generated. This will operate at a lower clock rate than the datapath.