 Implementing floating-point algorithms in FPGAs or ASICs - Embedded.com

# Implementing floating-point algorithms in FPGAs or ASICs

Floating-point is the most preferred data type to ensure high-accuracy calculations for algorithm modeling and simulation. Traditionally, when you want to deploy such floating-point algorithms to FPGA or ASIC hardware, your only choice is to convert every data type in the algorithm to fixed-point to conserve hardware resources and speed up calculations. Converting to fixed-point reduces mathematical precision, and sometimes it can be challenging to strike the right balance between data type word lengths and mathematical accuracy during conversion. For calculations that require high dynamic range or high precision (for example, designs that have feedback loops), fixed-point conversion can consume weeks or months of engineering time. Also, in order to achieve numerical accuracy, a designer has to use large fixed-point word lengths.

In this article, we will introduce The MathWorks' Native Floating-Point workflow for ASIC/FPGA design, using an IIR filter as an illustration. We will then review the challenges of using fixed-point, and we will compare the area and frequency tradeoffs of using single-precision floating point vs. fixed-point. We will also show how a combination of floating-point and fixed-point can give you much higher accuracy while reducing conversion and implementation time in real-world designs. You will see how modeling directly in floating-point can be important, and how sometimes it can significantly reduce area and improve speed in real-world designs with high dynamic range requirements, contrary to the popular belief that fixed-point is always more efficient compared to floating-point.

Native Floating-Point Implementation: Under the Hood

HDL Coder implements single-precision arithmetic by emulating the underlying math on the FPGA or ASIC resources (Figure 1). The generated logic unpacks the input floating-point signal into sign, exponent, and mantissa — individual integers that are 1, 8, and 23 bits wide, respectively.

The generated VHDL or Verilog logic then performs the floating-point calculation (a multiplication in the case shown in Figure 1) by figuring out the sign bit resulting from the input sign bits, the magnitude multiplication, and the addition of exponents and corresponding normalization necessary to compute the result. The last stage of the logic packs the sign, exponent, and mantissa back into a floating-point data type.

Tackling Dynamic Range Issues with Fixed-Point Conversion

A simple expression like (1-a)/(1+a), if it needs to be implemented with high dynamic range, can be translated naturally by using single-precision floating-point (Figure 2).

However, implementing the same equation in fixed-point requires many steps and numerical considerations (Figure 3).

For example, you must break the division into multiplication and reciprocal, use approximation methods such as Newton-Raphson or LUT (look-up table) for nonlinear reciprocal operation, use different data types to carefully control the bit growth, select the proper numerator and denominator types, and use specific output types and accumulator types for the adders and subtractors.

Exploring IIR Implementation Options

Let's look at an infinite impulse response (IIR) filter example. An IIR filter requires high dynamic range calculation with a feedback loop, making it tricky to converge on a fixed-point quantization. Figure 4a shows a test environment comparing three versions of the same IIR filter with a noisy sine wave input. The sine wave has an amplitude of 1, and the added noise increases the amplitude slightly.

The first version of the filter is double precision (Figure 4b). The second version is single-precision. The third version is a fixed-point implementation (Figure 4c). This implementation resulted in data types up to 22 bits in word length, with 1 bit allocated for the sign and 21 bits allocated for the fraction. This particular data type leaves 0 bits to represent the integer value, which makes sense given that its range of values will always be between -1 and 1 for the given stimulus. If the design has to work with different input values, that needs to be taken into account during fixed-point quantization.

The test environment is set up to compare the results of the single-precision and fixed-point filters with the double-precision filter, which is considered to be the golden reference. In both cases, a loss of precision will yield a certain amount of error. The question is whether that error is within an acceptable tolerance for our application.

When we ran Fixed-Point Designer to perform the conversion, we specified an error tolerance of 1%. Figure 5 shows the results of the comparisons. The error for the single-precision version is on the order of 10-8 , while the error for the fixed-point data type is on the order of 10-5 . This is within the error tolerance we specified. If your application needs higher precision, you may need to increase your fixed-point word lengths. Figure 5. Simulation results comparing the double-precision IIR filter results with the single-precision results (top) and fixed-point results (bottom) (© 1984–2018 The MathWorks, Inc.)

Converging on this quantization takes experience with hardware design, a comprehensive understanding of the possible system inputs, clear accuracy requirements, and some assistance from Fixed-Point Designer. This effort is worthwhile if it helps you shrink your algorithm for production deployment. But what about cases where you need to simply deploy to prototype hardware, or where the accuracy requirements make it difficult to reduce the physical footprint? A solution in these cases is to use single-precision Native Floating-Point.

Simplifying the Process with Native Floating Point

Using Native Floating-Point has two benefits as follows:

• You don't have to spend time trying to analyze the minimum number of bits needed to maintain sufficient precision for a wide variety of input data.
• The dynamic range of single-precision floating-point operations scales much more efficiently with a fixed cost of 32 bits.

Now, the design process is much simpler, and you know that with the bits of sign, exponent, and mantissa, you can represent a wide dynamic range of numbers. The table in Figure 6 compares the resource utilization of the floating-point and the fixed-point implementations of the IIR filter using the data type choices shown in Figure 5.