CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Modular Arithmetic: A Divisive Issue
Good techniques for FPGA implementations of the mod operation have not been widely known... until now!



Programmable Logic DesignLine
Binary to hexadecimal and beyond...
Handling x one bit at a time brought the modulus operation under control. Modern FPGAs have function blocks with more than one input bit, though. Function lookup tables in today's high-end FPGAs have four to six inputs, sometimes more. Block RAMs in FPGAs make it easy to create your own tables of eight or more binary inputs. Grouping the bits of x not only reduces the number of terms that need to be added, it also makes more efficient use of the FPGA fabric.

To demonstrate the approach, group bits of x into pairs. In practice, you'd probably use groups of four bits (hexadecimal), eight bits (base 256 representation), or some other technology-appropriate size. Bit-pairs are shown here only in order to demonstrate the technique; you can rework the approach to whatever grouping makes sense in your application and FPGA fabric.

Here's how Equation 2 looks when bits of x are taken two at a time:

Instead of thirty-two terms to add up for a 32-bit input x, this produces only 16 terms with a maximum total of 56, and with fewer terms to deal with as the groups of bits get larger.

As before, the designer is sure to find many additional optimizations once the solution has gone this far. In particular, when N has some power of two as a factor, modulus values for all xj with j>N have that power of two as a factor also. When you add up set of even numbers, for example, you can skip the least significant bit – it will always be zero. The same applies to intermediate sums of large enough xj in this approach, so there are gates to be saved when N is some odd multiple of 2, 4, 8, etc.

You can also configure the block RAMs in an FPGA as lookup tables. In this case, the mod-5 operation requires a table with at least three-bit outputs. Taking the M9K memories in Altera FPGAs as an example, one RAM can be configured to 2K x 4 bits – a twelve-bit lookup table that holds three-bit values, with a little left over.

Three of these tables would cover x up to 36 bits wide, with an intermediate sum of only three terms. That sum, which never exceeds 12, can be converted to a mod-5 answer using just a little combinational logic. An Altera Stratix III EPS3SL340 contains over a thousand M9K blocks; three of them could be a worthwhile trade off of block RAMs against gate count and logic depth, in some applications.

Dealing with rounding
As noted in the introduction, rounding can be rephrased as a problem in modular arithmetic. Using the example of rounding to the nearest hundred, it can be written like this:

This almost solves the problem, and with very straightforward logic. However, one issue remains; this always rounds 50 up to the hundred, introducing a small bias. Unbiased rounding uses a more complicated rule. If the decimal number ends in 50, then round to the nearest even hundred. In other words, 1150 and 1250 both round to 1200, one rounds up, the other rounds down.

A slight modification of Equation 4 easily takes care of this. The even-hundreds logic mean that we need to treat even and odd hundreds differently, so we must examine a wider range than in Equation 4 to give us Equation 5 as follows:

Conclusion
Even experienced logic designers can run into trouble with x mod N calculations, even when the constant N is known in advance. There are widely known ways to handle division by constants, such as multiplying by a fraction in fixed point. Modular arithmetic, even when disguised as a rounding problem, can be just as simple, if not simpler.

The technique discussed here works for any constant N, delivers solutions even when x has 32 or more bits, and can be adapted for efficient use of many different FPGA fabrics.

Tom VanCourt, Ph.D., is currently a senior member of technical staff at Altera Corporation. He develops system-building tools and champions performance computing on FPGAs. Tom has spent over 25 years in industry with DEC, HP, and other companies. He has also taught at Boston University, where he earned his Ph.D. in computer systems engineering.

Tom's interests include FPGA-based computing in finance, life science, medical imagining, and other application areas. His ongoing efforts include adapting existing FPGA tools and creating new ones to move beyond logic synthesis and into the very different world of FGPA-based performance computing.

1 | 2

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :