Approximations, part deux - Embedded.com

Approximations, part deux

Last month I heaped praise on the book Computer Approximations (John Hart et al, second edition, 1978, Malabar, FL: Robert Krieger Publishing Company). This mind-numbingly dull tome is the Bible for anyone creating an approximation to pretty much any well-known function, from Bessel functions to trig. It's a collection of thousands of polynomials, each of which computes one function to a specific precision over some limited range. Hart and his coauthors also describe the intricacies behind the polynomials in a way only a math teacher would enjoy. The proofs, alas, are left to the student.

If you missed last month's article, here are the key points:

• Compiler vendors provide built-in functions to compute the cosine, square root, and many other functions. When there's no hardware floating-point support, these code chunks always use an approximating polynomial or ratio of polynomials.

• These vendor-supplied approximations may not be optimum for speed-sensitive real-time systems.

• There's an annoying tradeoff between precision, speed, and range. Like the triad “schedule, features, and quality” you can pick only two. Together, the three parameters fight like a trio of American Idol judges.

• Since each approximation works over a limited range, the code must reduce the input argument to something within those bounds. Range reduction is by far the trickiest part of creating an approximation.

I gave code and coefficients for the square and cube roots. In this column, we'll look at exponentiation and logarithms.

Power to the exponents!
The three most common exponentiations we use are powers of two, e , and 10. In a binary computer it's trivial to raise an integer to a power of two, but floating point is much harder. Approximations to the rescue!

Hart's power-of-two exponentials expect an argument in the range of 0.0 to 0.5. That's not terribly useful for most real applications so we must use a bit of code to reduce the input x to that range.

Consider the equation 2x =2a X2(x–a ) . Well, duh. That appears both contrived and trite. Yet it's the basis for our range-reduction code. The trick is to select an integer a so that x is between 0 and 1. An integer, because it's both trivial and fast to raise 2 to any int. Simply set a to the integer part of x and feed (x -int(x )) into the approximating polynomial. Multiply the result by the easily computed 2a to correct for the range reduction.

But I mentioned that Hart wants inputs in the range of 0.0 to 0.5, not 0.0 to 1.0. If (x -int(a )) is greater than 0.5 use the relation:

(1)

instead of the one given earlier. The code is simpler than the equation. Subtract 0.5 from x and then compute the polynomial for (x -int(x )) as usual. Correct the answer by:

Exponents can be negative as well as positive. Remember that:

(2)

Solve for the absolute value of x and then take the reciprocal.

Though I mentioned it's easy to figure 2a , that glib statement glosses over a bit of complexity. The code in Listing 1 uses the worst possible strategy: it laboriously does floating-point multiplies. That's a general solution devoid of confusing chicanery: although it's clear, it's slow.

Listing 1: Code to reduce the range of an exponential

/*****************************************************  reduce_expb - The expb routines require an input argument in* the range[0.0, 0.5].This routine reduces the argument to that range.** Return values:* - "arg", which is the input argument reduced to that range* - "two_int_a", which is 2**(int(arg))* - "adjustment", which is an integer flag set to zero if the fractional*                part of arg is <=0.5; set to one otherwise** How this range reduction code works:*  (1) 2**x = 2**a * 2**(x - a)* and,*  (2) 2**x = 2**(1/2) * 2**a * 2**(x - a - 1/2)**  Now, this all looks pretty contrived. But the trick is to pick an integer "a"* such that the term (x - a) is in the range [0.0, 1.0]. If the result is in* the range we use equation 1. If it winds up in [0.5, 1.0], use equation (2) which* will get it down to our desired [0.0, 0.5] range.**  The return value "adjustment" tells the calling function if we are using the* first or the second equation.****************************************************/void reduce_expb(double *arg, double *two_int_a, int *adjustment){  int int_arg;                       // integer part of the input argument  *adjustment = 0;                   // Assume we're using equation (2)  int_arg =  (int) *arg;  if((*arg - int_arg) > 0.5)         // if frac(arg) is in [0.5, 1.0]...     {        *adjustment = 1;        *arg = *arg - 0.5;           // ... then change it to [0.0, 0.5]     }  *arg = *arg - (double) int_arg;    // arg is now just the fractional part// Now compute 2** (int) arg.   *two_int_a = 1.0;    for(; int_arg!=0; int_arg--)*two_int_a = *two_int_a * 2.0;}; 

Other options exist. If the input x never gets very big, use a look-up table of precomputed values. Even a few tens of entries will let the resulting 2x assume huge values.Another approach simply adds a to the number's mantissa. Floating-point numbers are represented in memory using two parts, one fractional (the characteristic), plus an implied 2 raised to the other part, the mantissa. Apply a to the mantissa to create 2a . That's highly nonportable but very fast.Or, you could shift a 1 left through an int or long a times and then convert that to a float or double.Computer Approximations lists polynomials for 46 variants of 2x , with precisions ranging from four to 25 decimal digits. Room and sanity prohibit listing them all, but here are two of the most useful examples. Both assume the input argument is between 0 and 0.5, and both use the following ratio of two polynomials:

(3)The following coefficients yield a precision of 6.36 decimal digits. Note that P (x ) is simply an offset, and Q01 is 1, making this a very fast and reasonably accurate approximation:

P00   (– 1)   +0.86778   38827    9Q00   (– 2)   +0.25039   10665   03Q01   (+ 1)   +0.1 

For 9.85 digits of precision use the somewhat slower (also see Listing 2):

P00   (+ 1)   +0.72151   89152    1493P01   (– 1)   +0.57690   07237      31Q00   (+ 2)   +0.20818   92379   30062 Q01   (+ 1)   +0.1 

Listing 2: 2x to 9.85 digits

/**************************************************** expb1063 - compute 2**x to 9.85 digits accuracy****************************************************/double expb1063(double arg){  const double P00 = + 7.2152891521493;  const double P01 = + 0.0576900723731;   const double Q00 = +20.8189237930062;  const double Q01 = + 1.0;  const double sqrt2= 1.4142135623730950488;   // sqrt(2) for scaling   double two_int_a;                            // 2**(int(a)); used to scale result  int    adjustment;                           // set to 1 by reduce_expb if must...                                               // ...adjust the answer by sqrt(2)  double answer;                               // The result  double Q;                                    // Q(x**2)  double x_P;                                  // x*P(x**2)  int    negative=0;                           // 0 if arg is +; 1 if negative//  Return an error if the input is too large. "Too large" is entirely// a function of the range of your floating point library and expected inputs// used in your application.  if(abs(arg>100.0))  {     printf("nYikes! %d is a big number, fella. Aborting.", arg);     return 0;  }//  If the input is negative, invert it. At the end we'll take // the reciprocal, since n**(-1) = 1/(n**x).  if(arg<0.0)  {     arg = -arg;     negative = 1;  }  reduce_expb(&arg, &two_int_a, &adjustment);  // reduce input to [0.0, 0.5]// The format of the polynomial is://  answer=(Q(x**2) + x*P(x**2))/(Q(x**2) - x*P(x**2))////  The following computes the polynomial in several steps:  Q   =        Q00 + Q01 * (arg * arg);  x_P = arg * (P00 + P01 * (arg * arg));  answer= (Q + x_P)/(Q - x_P); // Now correct for the scaling factor of 2**(int(a))  answer= answer * two_int_a;  // If the result had a fractional part  > 0.5, correct for that  if(adjustment == 1)answer=answer * sqrt2; // Correct for a negative input  if(negative == 1) answer = 1.0/answer;  return(answer);}; 

If you're feeling a need for serious accuracy try the following coefficients, which are good to an astonishing 24.78 digits. You'll need a more complex range-reduction algorithm to keep the input between [0, 1/256]. Try the coefficients in this table:

P00    (+ 1)    +0.72134    75314    61762    84602    46233    635P01    (- 1)    +0.57762    26063    55921    17671    75Q00    (+ 2)    +0.20813    69012    79476    15341    50743    885Q01    (+ 1)    +0.1 

Note that the polynomials, both of degree 1 only, quickly plop out an answer.

Other exponentials
What about the more common natural and decimal exponentiations? Computer Approximations lists many that are accurate over a variety of ranges. The most useful seem to be those that work between 0.0 to 0.5 and 0.0 to 1.0. Use a range-reduction approach much like I just described, substituting the appropriate base: 10x =10a X10(x–a ) or ex =ea Xe (x–a ) .Using the same ratio of polynomials in Equation 3 the coefficients in this next table give 10x to 12.33 digits.

P00    (+ 2)    +0.41437    43559    42044    8307P01    (+ 1)    +0.60946    20870    43507    08P02    (- 1)    +0.76330    97638    32166Q00    (+ 2)    +0.35992    09924    57256    1042Q01    (+ 2)    +0.21195    92399    59794    679Q02    (+ 1)    +0.1 

The downside of using this or similar polynomials is you've got to solve for that pesky 10a . Though a look-up table is one fast possibility if the input stays in a reasonable range, there aren't many other attractive options. If you're working with an antique IBM 1602 BCD machine perhaps it's possible to left shift by factors of 10 in a way analogous to that I've outlined for exponents of two. In today's binary world there are no left-shift decimal instructions so we must multiply by 10. Instead, combine one of the following relations:

(4)

or:

(5)

with the power-of-two approximations as shown in Listing 3.

Listing 3: Computing 10x using a 2x approximation

/**************************************************** expd_based_on_expb1063 - compute 10**x to 9.85 digits   accuracy**  This is one approach to computing 10**x. Note that:* 10**x = 2** (x / log_base_10(2)), so* 10**x = expb(x/log_base_10(2))****************************************************/double expd_based_on_expb1063(double arg){   const double log10_2 = 0.30102999566398119521;   return(expb1063(arg/log10_2));} 

Logarithms
There's little magic to logs, other than a different range-reduction algorithm. Again, it's easiest to develop code that works with logarithms of base 2. Note that:

(6)We pick an n that keeps f between 0.5 to 1.0, since the Computer Approximations most useful approximations require that range. The reduction algorithm, shown in Listing 4, is similar to the one for square roots that I described in the last month's column. Listing 5 gives code to compute log base 2 to 4.14 digits of precision.

Listing 4: Log range-reduction code

/*****************************************************  reduce_log2 - The log2 routines require an input argument in* the range[0.5, 1.0].This routine reduces the argument to that range.** Return values:* - "arg", which is the input argument reduced to that range* - "n", which is n from the discussion below* - "adjustment", set to 1 if the input was < 0.5** How this range reduction code works:*  If we pick an integer n such that x=f x 2**n, and 0.5<= f < 1,="" and="" *="" assume="" all="" "log"="" functions="" in="" this="" discussion="" means="" log(base="" 2),="" then:="" *="" log(x)="log(f" x="" 2**n)="" *="log(f)" +="" log(2**n)="" *="log(f)" +="" n="" *="" *="" the="" ugly="" "for"="" loop="" shifts="" a="" one="" through="" two_to_n="" while="" shifting="" a="" long="" *="" version="" of="" the="" input="" argument="" right,="" repeating="" till="" the="" long="" *="" version="" is="" all="" zeroes.="" thus,="" two_to_n="" and="" the="" input="" argument="" have="" this="" relationship:="" *="" *="" two_to_n="" input="" argument="" n="" *="" 1="" 0.5="" to="" 1="" 0="" *="" 2="" 1="" to="" 2="" 1="" *="" 4="" 2="" to="" 4="" 2="" *="" 8="" 4="" to="" 8="" 3="" *="" etc.="" *="" *="" there's="" a="" special="" case="" for="" the="" argument="" being="" less="" than="" 0.5.="" *="" in="" this="" case="" we="" use="" the="" relation:="" *="" log(1/x)="log(1)" -log(x)="" *="-log(x)" *="" that="" is,="" we="" take="" the="" reciprocal="" of="" x="" (which="" will="" be="" bigger="" than="" 0.5)="" and="" solve="" *="" for="" the="" above.="" to="" tell="" the="" caller="" to="" do="" this,="" set="" "adjustment"="1." *="" ***************************************************/="" void="" reduce_log2(double="" *arg,="" double="" *n,="" int="" *adjustment)="" {="" long="" two_to_n;="" divisor="" we're="" looking="" for:="" 2^(2k)="" long="" l_arg;="" long="" (32="" bit)="" version="" of="" input="" *adjustment="0;" if(*arg<0.5)="" {="" if="" small="" arg="" use="" the="" reciprocal="" *arg="1.0/(*arg);" *adjustment="1;" }="" shift="" arg="" to="" zero="" while="" computing="" two_to_n="" as="" described="" above="" l_arg="(long)" *arg;="" l_arg="" is="" long="" version="" of="" input="" arg="" for(two_to_n="1," *n="0.0;" l_arg!="0;" l_arg>>="1," two_to_n<<="1," *n+="1.0);" *arg="*arg" (double)="" two_to_n;="" normalize="" input="" argument="" to="" [0.5,="" 1]=""> 

Listing 5: Log base 2 code good to 4.14 digits

/**************************************************** LOG2_2521 - compute log(base 2)(x) to 4.14 digits    accuracy****************************************************/  double log2_2521(double arg){  const double P00 = -1.45326486;  const double P01 = +0.951366714;  const double P02 = +0.501994886;  const double Q00 = +0.352143751;   const double Q01 = +1.0;  double n;               // used to scale result  double poly;            // result from polynomial  double answer;          // The result  int    adjustment;      // 1 if argument was < 0.5  //  Return an error if the input is <=0 since log is not real at and below 0.   if(arg<=0.0)  {     printf("nHoly smokes! %d is too durn small. Aborting.", arg);     return 0;  }  reduce_log2(&arg, &n,     &adjustment);        // reduce input to [0.5, 1.0]// The format of the polynomial is P(x)/Q(x)  poly= (P00 + arg * (P01 + arg * P02)) / (Q00 + Q01 * arg); // Now correct for the scaling factors   if(adjustment)answer = -n - poly;  else answer=poly + n;  return(answer);}; 

Need better results? The following coefficients are good to 8.32 digits over the same range:

P00    (+1)    -.20546    66719    51P01    (+1)    -.88626    59939    1P02    (+1)    +.61058    51990    15P03    (+1)    +.48114    74609    89Q00    (+0)    +.35355    34252    77Q01    (+1)    +.45451    70876    29Q02    (+1)    +.64278    42090    29Q03    (+1)    +.1 

Though Computer Approximations includes plenty of approximations for common and natural logs, it's probably more efficient to compute log2 and use the change of base formulae:

(7)

and:

(8)

So far I've talked about the approximations' precision imprecisely. What exactly does precision mean? Computer Approximations lists relative precision for roots and exponentials:


(9)

Logarithms are a different game altogether; the book uses absolute precision, defined by:

(10)

To figure decimal digits of precision use:

(11)

The good book
Computer Approximations includes polynomials for all of the standard and inverse trig functions and much more. See www.embedded.com/code or www.ganssle.com/app.htm for implementations of trig functions.

Converting the terse range reduction explanations to useful algorithms does take a bit of work, but the book is an invaluable treasure for people faced with creating approximations. The samples I've shown just touch a fraction of a percent of the approximations contained in the book.

It's pretty cool to crank parameters into these simple equations that use such bizarre-looking coefficients and to see astonishingly correct answers pop out every time.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com.


Have you seen the book “An Atlas of Functions” by Spanier and Oldham? They also offer approximations, but fewer than “Computer Approximations”. It's a fun book, as they go.

– Fred Williams


Hart is a dangerous book. In my work with a math Ph.D'ed colleague on high speed algorithms for Map Projection, we started comapring Hart's algorithms to processor architecture features. Many of the algorithms in Hart like those given above are rational algorithms and as such have a floating point divide in them.

On most desktop and workstation processors do double precision floating point multiplies in one clock cycle while single or double precision floating point divides and interger divides normally take 32 cycles. Processor pipelines also often stall on the divide operation. Our research showed that in our application domain that one was usually better off using a long polynomial approximation rather than a rational approximation. This may not hold for some of the embedded processors with no FPU's but if you have FPU support, beware that some optimization based on rational functions may be pessimization.

A perusal of the cycle counts for your architecture's floating point operations may be enlightening.

– Cameron Kellough

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.