Approximating reality - Embedded.com

Approximating reality

In the early embedded days Real Programmers wrote their code in assembly language. They knew exactly what the program did and how many clock cycles it took to run.

Then customers wanted systems that worked with real-world data. Floating point reared its very ugly head. Anyone can do integer math in assembly, but floats are tough indeed.

Possibly the first solution was written by one C. E. Ohme. He contributed the source for a floating-point library to Intel's Insite library, a free service that far predated the open-source movement. This was long before the Internet so Intel sent hard-copy listings to developers. You could also order the source on medium–punched paper tape!

Ohme's package did the four basic bits of arithmetic on 32-bit floats, astonishingly using only 768 bytes of ROM. That's remarkable considering the 8008's limited instruction set, which had a hardware stack only seven levels deep.

Then customers wanted systems that did real-world computations. Roots and trig, logs and exponentiation were suddenly part of many embedded systems.

Today we hardly think about these constructs. Need to take the cosine of a number? Include math.h and use double cos(double) . Nothing to it. But that's not so easy in assembly language unless the processor has built-in floating-point instructions, a rare thing in the embedded space even now. So developers crafted a variety of packages that computed these more complex functions. To this day these sorts of routines are used in compiler runtime libraries. They're a great resource that gives us a needed level of abstraction from their grimy computational details.

But the packages unfortunately isolate us from these grimy computational details, which can lead to significant performance penalties. A fast desktop with built-in floating-point hardware can return a cosine in a few tens of nanoseconds. But the same function often takes many milliseconds on embedded systems.

Or even longer. Everyone recognizes that to perform a double-precision trig function (as required by the C standard) would take about a week on an 8051, so most of those compilers cheat and return shorter floats.

Compiler vendors are in the unenviable position of trying to provide a runtime library that satisfies most users. Yet in the world of embedded systems we often have peculiar requirements that don't align with those of the rest of the world. What if you need a really speedy cosine but don't require a lot of precision? A look-up table is fast, but when accurate to more than a couple of decimal digits grows in size faster than congressional pork.

Approximations
Most of the complicated math functions we use compute what is inherently not computable. There's no precise solution to most trig functions, square roots, and the like, so libraries employ approximations of various kinds. We, too, can employ various approximations to meet peculiar embedded requirements.

College taught us that any differentiable function can be expressed by a Taylor series, such as:

Like all approximations used in runtime packages, this is a polynomial that's clear, simple, and easy to implement. Unhappily, it takes dozens of terms to achieve even single-precision float accuracy, which is an enormous amount of computation. Plenty of other series expansions exist, many of which converge on an accurate answer much faster than the Taylor series. Entire books have been written on approximations and one could spend years mastering the subject. But we're engineers, not mathematicians. We need cookbook solutions now. Happily, computer scientists have come up with many.

The best reference on the subject is the suitably titled Computer Approximations (second edition published in 1978 by John Hart et al.).1 It contains thousands of approximations for all of the normal functions like roots, exponentials, and trig, as well as approaches for esoteric relations like Bessel functions and elliptic integrals. It's a must-have for those of us working on real-time resource-constrained systems.

But it's out of print. You can pry my copy out of my cold, dead fingers when my wetwear finally crashes for good. As I write this there are six copies available on Amazon Marketplace for about $250 each. Long ago I should have invested in Google, Microsoft, and Computer Approximations !2

Be warned: it's written by eggheads for eggheads. The math behind the derivation of the polynomials goes over my head.

Tables of polynomial coefficients form the useful meat of the book. Before each class of function (trig, roots, and so forth) there's a handful of pages with suggestions about using the coefficients in a program. Somewhere, enigmatically buried in the text like an ancient clue in a Dan Brown thriller, an important point about actually using the coefficients lies lost. So here's how to interpret the tables.

How to interpret tables
An “index” vectors you to the right set of tables. An index for square roots looks like this:

Which means: the set of coefficients labeled “0033” is for a polynomial of degree 4, accurate to 2.56 decimal digits for inputs between and 1. Outside that range all bets are off. The polynomial is of the form P (x ). (Several other forms are used, like P (x )/Q (x ), so that bit of what seems like pedantry is indeed important.)

This is the corresponding table of coefficients:

P 00   (+ 0)   +0.14743   837
P 01 (+ 1) +0.19400 802
P 02 (+ 1) -0.26795 117
P 03 (+ 1) +0.25423 691
P 04 (+ 0) -0.95312 89

The number in parenthesis is the power of 10 to apply to each number. Entry P 02, prefixed by (+1), is really -2.6795117.It took me a long time to figure that out.Knowing this, the polynomial for the square root of x , over that limited range, accurate to 2.56 digits is:

It looks like a lot of trouble for a not particularly accurate way to figure roots. The genius of this book, though, is that the authors list so many variations (90 for square roots alone) that you can make tradeoffs in selecting the right polynomial for your application.Three factors influence the decision: required accuracy, speed, and range. Like all good engineering predicaments, these conspire against each other. High speed generally means lower accuracy. Extreme accuracy requires a long polynomial that will burn plenty of CPU cycles, or a microscopic range. For instance, the coefficients shown in Figure 1 express 2x to 24.78 decimal digits of accuracy.
A bit of clever programming yields a routine that's surprisingly fast for such accuracy. Except that the range of x is limited to 0 to 1/256, which isn't particularly useful in most applications.The trick is to do a bit of magic to the input argument to reduce what might be a huge range to a range that satisfies the requirements of the polynomial. Since the range-reduction process consumes CPU cycles it's another factor to consider when selecting an approximation.Hart's book does describe range-reduction math, in a manner an academic would love. But by puzzling over his verbiage and formulas, with some doodling on paper, it's not too hard to construct useful range-reduction algorithms.For example, I find the most useful square-root polynomials require an input between 0.25 and 1. Since we can keep increasing k till x × 2–2k is in the required range. Then use the polynomial to take the square root, and adjust the answer by 2k . Listing 1 implements the algorithm.
Listing 1 Code to reduce the range of a square root

/*****************************************************  reduce_sqrt - The square root routines require an input argument in* the range[0.25, 1].This routine reduces the argument to that range.** Return values:* - "reduced_arg", which is the input arg/2^(2k)* - "scale", which is the sqrt of the scaling factor, or 2^k** Assumptions made:* - The input argument is > zero* - The input argument cannot exceed +2^31 or be under 2^-31. ** Possible improvements:* - To compute a* 2^(-2k)) we do the division (a/(2^2k)). It's*   much more efficient to decrement the characteristic of "a" (i.e., monkey*   with the floating point representation of "a") the appropriate number*   of times. But that's far less portable than the less efficient approach*   shown here.* * How it works:*  The algorithm depends on the following relation:* sqrt(arg)=(2^k) * sqrt(arg * 2^(-2k))**  We pick a "k" such that 2^(-2k) * arg is less than 1 and greater than 0.25. **  The ugly "for" loop shifts a one through two_to_2k while shifting a long* version of the input argument right two times, repeating till the long* version is all zeroes. Thus, two_to_2k and the input argument have this relationship:**        two_to_2k        input argument         sqrt(two_to_2k)*        4                1 to 4                 2*        16               5 to 16                4*        64               17 to 64               8*        256              65 to 256             16**  Note that when we're done, and then divide the input arg by two_to_2k, the* result is in the desired [0.25, 1] range.**  We also must return "scale", which is  sqrt(two_to_2k), but we prefer* not to do the computationally expensive square root. Instead note* (as shown in the table above) that scale simply doubles with each iteration.**  There's a special case for the input argument being less than the [0.25,1]* range. In this case we multiply by a huge number, one sure to bring* virtually every argument up to 0.25, and then adjust* "scale" by the square root of that huge number.*/***************************************************void reduce_sqrt(double arg, double *reduced_arg, double *scale){    long two_to_2k;                                // divisor we're looking for: 2^(2k)    long l_arg;                                    // long (32 bit) version of input argument    const double huge_num=1073741824;              // 2**30    const double sqrt_huge_num=    32768;          // sqrt(2**30)    if(arg>=0.25){          // shift arg to zero while computing two_to_2k as described above          l_arg=(long) arg;                        // l_arg is long version of input arg          for(two_to_2k=1, *scale=1.0; l_arg!=0; l_arg>>=2, two_to_2k<=2, *scale*="2.0);" *reduced_arg="arg/(double)" two_to_2k;="" normalize="" input="" argument="" to="" [0.25,="" 1]="" }else="" {="" for="" small="" arguments:="" arg="arg*huge_num;" make="" the="" number="" big="" l_arg="(long)" arg;="" l_arg="" is="" long="" version="" of="" input="" arg="" for(two_to_2k="1," *scale="1.0;" l_arg!="0;" l_arg="">>=2, two_to_2k<=2, *scale*="2.0);" *scale="*scale/sqrt_huge_num;" *reduced_arg="arg/(double)" two_to_2k;="" normalize="" input="" argument="" to="" [0.25,="" 1]="" };="" };="">  

Here's a sampling of interesting polynomials for square roots. Note that they can all be used with the range-reduction scheme I've described.With just two terms, using the form P (x ) over the range [1/100, 1], this one is accurate to 0.56 decimal digits and is very fast:

P 00 (+0) +0.11544 2
P 01 (+1) +0.11544 2

A different form gives 3.66 digits over 1/4 to 1:

P 00   (- 1)    +0.85805    2283
P 01 (+ 1) +0.10713 00909
P 02 (+ 0) +0.34321 97895
Q 00 (+ 0) +0.50000 08387
Q 01 (+ 1) +0.1

The following yields 8.95 digits accuracy using , but works over the narrower 1/2 to 1 range. Use the usual range-reduction algorithm to narrow x to 0.25 to 1, and then double the result if it's less than 1/2. Scale the result by the square root of two:

P 00   (+ 0)    +0.29730    27887    4025
P 01 (+ 1) +0.89403 07620 6457
P 02 (+ 2) +0.21125 22405 69754
P 03 (+ 1) +0.59304 94459 1466
Q 00 (+ 1) +0.24934 71825 3158
Q 01 (+ 2) +0.17764 13382 80541
Q 02 (+ 2) +0.15035 72331 29921
Q 03 (+ 1) +0.1

Hart's book also contains coefficients for cube roots. Reduce the argument's range to 0.5 to 1 (generally useful with his cube root approximations) using the relation (see Listing 2 for details):

Listing 2 Code to reduce the range of a cube root

/*****************************************************  reduce_cbrt - The cube root routines require an input argument in*  the range[0.5, 1].This routine reduces the argument to that range.** Return values:* - "reduced_arg", which is the input arg/2^(3k)* - "scale", which is the cbrt of the scaling factor, or 2^k** Assumptions made:* - The input argument cannot exceed +/-2^31 or be under +/-2^-31. ** Possible improvements:* - To compute a* 2^(-3k)) we do the division (a/(2^3k)). It's*   much more efficient to decrement the characteristic of "a" (i.e., monkey*   with the floating point representation of "a") the appropriate number*   of times. But that's far less portable than the less efficient approach*   shown here.* - Corections made for when the result is still under 1/2 scale by two. It's more*   efficient to increment the characteristic.* * How it works:*  The algorithm depends on the following relation:*  cbrt(arg)=(2^k) * cbrt(arg * 2^(-3k))**  We pick a "k" such that 2^(-3k) * arg is less than 1 and greater than 0.5. **  The ugly "for" loop shifts a one through two_to_3k while shifting a long* version of the input argument right three times, repeating till the long* version is all zeroes. Thus, two_to_3k and the input argument have this relationship:**        two_to_3k          input argument          cbrt(two_to_3k)*        8                  1 to 7                  2*        64                 8 to 63                 4*        512                64 to 511               8*        4096               512 to 4095            16**  Note that when we're done, and then divide the input arg by two_to_3k, the* result is between [1/8,1]. The following algorithm reduces it to [1/2, 1]:**  if (reduced_arg is between [1/4, 1/2])*		multiply it by two and correct the scale by the cube root of two.*  if (reduced_arg is between [1/4, 1/2])*		multiply it by two and correct the scale by the cube root of two.**  Note that the if the argument was between [1/8, 1/4] both of those "if"s* will execute.**  We also must return "scale", which is  cbrt(two_to_3k), but we prefer* not to do the computationally expensive cube root. Instead note* (as shown in the table above) that scale simply doubles with each iteration.**  There's a special case for the input argument being less than the [0.5,1]* range. In this case we multiply by a huge number, one sure to bring* virtually every argument up to 0.5, and then adjust* "scale" by the cube root of that huge number.**  The code takes care of the special case that the cube root of a negative* number is negative by passing the absolute value of the number to the* approximating polynomial, and then making "scale" negative.****************************************************/void reduce_cbrt(double arg, double *reduced_arg, double *scale){    long two_to_3k;                                // divisor we're looking for: 2^(3k)    long l_arg;                                    // long (32 bit) version of input argument    const double huge_num=1073741824;              // 2**30    const double cbrt_huge_num=  1024;             // cbrt(2**30)    const double cbrt_2= 1.25992104989487;         // cbrt(2)    *scale=1.0;    if(arg<0){ if="" negative="" arg,="" abs(arg)="" and="" set="" arg="fabs(arg);" scale="" to="" -1="" so="" the="" polynomial="" routine="" *scale="-1.0;" will="" give="" a="" negative="" result="" };="" if(arg="">=0.5){          // shift arg to zero while computing two_to_3k as described above          l_arg=(long) arg;                        // l_arg is long version of input arg          for(two_to_3k=1; l_arg!=0; l_arg>>=3, two_to_3k<=3, *scale*="2.0);" *reduced_arg="arg/(double)" two_to_3k;="" normalize="" input="" argument="" to="" [0.5,="" 1]=""><0.5){ if="" in="" the="" range="" [1/8,1/2]="" correct="" it="" *reduced_arg*="2;" *scale/="cbrt_2;" };=""><0.5){ if="" in="" the="" range="" [1/4,1/2]="" correct="" it="" *reduced_arg*="2;" *scale/="cbrt_2;" };="" }else="" {="" for="" small="" arguments:="" arg="arg*huge_num;" make="" the="" number="" big="" l_arg="(long)" arg;="" l_arg="" is="" long="" version="" of="" input="" arg="" for(two_to_3k="1;" l_arg!="0;" l_arg="">>=3, two_to_3k<=3, *scale*="2.0);" *scale="*scale/cbrt_huge_num;" *reduced_arg="arg/(double)" two_to_3k;="" normalize="" input="" argument="" to="" [0.25,="" 1]=""><0.5){ if="" in="" the="" range="" [1/8,1/2]="" correct="" it="" *reduced_arg*="2;" *scale/="cbrt_2;" };=""><0.5){ if="" in="" the="" range="" [1/4,1/2]="" correct="" it="" *reduced_arg*="2;" *scale/="cbrt_2;" };="" };="" };="">  

With just two terms in the form P (x ) these coefficients give a cube root accurate to 1.24 digits over the range 1/8 to 1:

P 00   (+ 0)    +0.45316    35
P 01 (+ 0) +0.60421 81

Adding a single term improves accuracy to 3.20 digits over 1/2 to 1:

P 00   (+ 0)    +0.49329    5663
P 01 (+ 0) +0.69757 0456
P 02 (+ 0) -0.19150 216

Kick accuracy up by more than a notch, to 11.75 digits over 1/2 to 1 using the form :

P 00   (+ 0)    +0.22272    47174    61818
P 01 (+ 1) +0.82923 28023 86013 7
P 02 (+ 2) +0.35357 64193 29784 39
P 03 (+ 2) +0.29095 75176 33080 76
P 04 (+ 1) +0.37035 12298 99201 9
Q 00 (+ 1) +0.10392 63150 11930 2
Q 01 (+ 2) +0.16329 43963 24801 67
Q 02 (+ 2) +0.39687 61066 62995 25
Q 03 (+ 2) +0.18615 64528 78368 42
Q 04 (+ 1) +0.1

I've coded a number of square and cube root approximations in C with test routines to check their accuracy. Find them here at www.embedded.com/code. Although all of the routines use doubles to ensure fair comparisons, the lower-precision ones will work fine with floats.

There's another very valuable resource that discusses these sorts of approximations and more – for instance, integer versions. That's Jack Crenshaw's wonderful MATH Toolkit for REAL-TIME Programming. It's a lot easier read than Hart's book, and covers other aspects of these issues.

More next month!

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at .

Endnotes:
1. Hart, John F., E. W. Cheney, Charles L. Lawson, Hans J. Maehly, Charles K. Mesztenyi, John R. Rice, Henry G. Thacher, and Christoph Witzgall. Computer Approximations. 2nd edition. Malabar, FL: Robert Krieger Publishing Company, 1978.
2. The original 1968 version, published by Wiley, is listed as $25 on the Amazon Marketplace. The second edition from 1978 is listed at $263.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.