Approximations, part deux
Last month I heaped praise on the book Computer Approximations (John Hart et al, second edition, 1978, Malabar, FL: Robert Krieger Publishing Company). This mind-numbingly dull tome is the Bible for anyone creating an approximation to pretty much any well-known function, from Bessel functions to trig. It's a collection of thousands of polynomials, each of which computes one function to a specific precision over some limited range. Hart and his coauthors also describe the intricacies behind the polynomials in a way only a math teacher would enjoy. The proofs, alas, are left to the student.
If you missed last month's article, here are the key points:
• Compiler vendors provide built-in functions to compute the cosine, square root, and many other functions. When there's no hardware floating-point support, these code chunks always use an approximating polynomial or ratio of polynomials.
• These vendor-supplied approximations may not be optimum for speed-sensitive real-time systems.
• There's an annoying tradeoff between precision, speed, and range. Like the triad “schedule, features, and quality” you can pick only two. Together, the three parameters fight like a trio of American Idol judges.
• Since each approximation works over a limited range, the code must reduce the input argument to something within those bounds. Range reduction is by far the trickiest part of creating an approximation.
I gave code and coefficients for the square and cube roots. In this column, we'll look at exponentiation and logarithms.
Power to the exponents!
The three most common exponentiations we use are powers of two, e , and 10. In a binary computer it's trivial to raise an integer to a power of two, but floating point is much harder. Approximations to the rescue!
Hart's power-of-two exponentials expect an argument in the range of 0.0 to 0.5. That's not terribly useful for most real applications so we must use a bit of code to reduce the input x to that range.
Consider the equation 2x =2a X2(x–a ) . Well, duh. That appears both contrived and trite. Yet it's the basis for our range-reduction code. The trick is to select an integer a so that x is between 0 and 1. An integer, because it's both trivial and fast to raise 2 to any int. Simply set a to the integer part of x and feed (x -int(x )) into the approximating polynomial. Multiply the result by the easily computed 2a to correct for the range reduction.
But I mentioned that Hart wants inputs in the range of 0.0 to 0.5, not 0.0 to 1.0. If (x -int(a )) is greater than 0.5 use the relation:

(1)
instead of the one given earlier. The code is simpler than the equation. Subtract 0.5 from x and then compute the polynomial for (x -int(x )) as usual. Correct the answer by:

Exponents can be negative as well as positive. Remember that:

(2)
Solve for the absolute value of x and then take the reciprocal.
Though I mentioned it's easy to figure 2a , that glib statement glosses over a bit of complexity. The code in Listing 1 uses the worst possible strategy: it laboriously does floating-point multiplies. That's a general solution devoid of confusing chicanery: although it's clear, it's slow.
Listing 1: Code to reduce the range of an exponential
/***************************************************** reduce_expb - The expb routines require an input argument in* the range[0.0, 0.5].This routine reduces the argument to that range.** Return values:* - "arg", which is the input argument reduced to that range* - "two_int_a", which is 2**(int(arg))* - "adjustment", which is an integer flag set to zero if the fractional* part of arg is <=0.5; set to one otherwise** How this range reduction code works:* (1) 2**x = 2**a * 2**(x - a)* and,* (2) 2**x = 2**(1/2) * 2**a * 2**(x - a - 1/2)** Now, this all looks pretty contrived. But the trick is to pick an integer "a"* such that the term (x - a) is in the range [0.0, 1.0]. If the result is in* the range we use equation 1. If it winds up in [0.5, 1.0], use equation (2) which* will get it down to our desired [0.0, 0.5] range.** The return value "adjustment" tells the calling function if we are using the* first or the second equation.****************************************************/void reduce_expb(double *arg, double *two_int_a, int *adjustment){ int int_arg; // integer part of the input argument *adjustment = 0; // Assume we're using equation (2) int_arg = (int) *arg; if((*arg - int_arg) > 0.5) // if frac(arg) is in [0.5, 1.0]... { *adjustment = 1; *arg = *arg - 0.5; // ... then change it to [0.0, 0.5] } *arg = *arg - (double) int_arg; // arg is now just the fractional part// Now compute 2** (int) arg. *two_int_a = 1.0; for(; int_arg!=0; int_arg--)*two_int_a = *two_int_a * 2.0;};
Other options exist. If the input x never gets very big, use a look-up table of precomputed values. Even a few tens of entries will let the resulting 2x assume huge values.Another approach simply adds a to the number's mantissa. Floating-point numbers are represented in memory using two parts, one fractional (the characteristic), plus an implied 2 raised to the other part, the mantissa. Apply a to the mantissa to create 2a . That's highly nonportable but very fast.Or, you could shift a 1 left through an int or long a times and then convert that to a float or double.Computer Approximations lists polynomials for 46 variants of 2x , with precisions ranging from four to 25 decimal digits. Room and sanity prohibit listing them all, but here are two of the most useful examples. Both assume the input argument is between 0 and 0.5, and both use the following ratio of two polynomials:

P00 (– 1) +0.86778 38827 9Q00 (– 2) +0.25039 10665 03Q01 (+ 1) +0.1
For 9.85 digits of precision use the somewhat slower (also see Listing 2):
P00 (+ 1) +0.72151 89152 1493P01 (– 1) +0.57690 07237 31Q00 (+ 2) +0.20818 92379 30062 Q01 (+ 1) +0.1
Listing 2: 2x to 9.85 digits
/**************************************************** expb1063 - compute 2**x to 9.85 digits accuracy****************************************************/double expb1063(double arg){ const double P00 = + 7.2152891521493; const double P01 = + 0.0576900723731; const double Q00 = +20.8189237930062; const double Q01 = + 1.0; const double sqrt2= 1.4142135623730950488; // sqrt(2) for scaling double two_int_a; // 2**(int(a)); used to scale result int adjustment; // set to 1 by reduce_expb if must... // ...adjust the answer by sqrt(2) double answer; // The result double Q; // Q(x**2) double x_P; // x*P(x**2) int negative=0; // 0 if arg is +; 1 if negative// Return an error if the input is too large. "Too large" is entirely// a function of the range of your floating point library and expected inputs// used in your application. if(abs(arg>100.0)) { printf("nYikes! %d is a big number, fella. Aborting.", arg); return 0; }// If the input is negative, invert it. At the end we'll take // the reciprocal, since n**(-1) = 1/(n**x). if(arg<0.0) { arg = -arg; negative = 1; } reduce_expb(&arg, &two_int_a, &adjustment); // reduce input to [0.0, 0.5]// The format of the polynomial is:// answer=(Q(x**2) + x*P(x**2))/(Q(x**2) - x*P(x**2))//// The following computes the polynomial in several steps: Q = Q00 + Q01 * (arg * arg); x_P = arg * (P00 + P01 * (arg * arg)); answer= (Q + x_P)/(Q - x_P); // Now correct for the scaling factor of 2**(int(a)) answer= answer * two_int_a; // If the result had a fractional part > 0.5, correct for that if(adjustment == 1)answer=answer * sqrt2; // Correct for a negative input if(negative == 1) answer = 1.0/answer; return(answer);};
If you're feeling a need for serious accuracy try the following coefficients, which are good to an astonishing 24.78 digits. You'll need a more complex range-reduction algorithm to keep the input between [0, 1/256]. Try the coefficients in this table:
P00 (+ 1) +0.72134 75314 61762 84602 46233 635P01 (- 1) +0.57762 26063 55921 17671 75Q00 (+ 2) +0.20813 69012 79476 15341 50743 885Q01 (+ 1) +0.1
Note that the polynomials, both of degree 1 only, quickly plop out an answer.
Other exponentials
What about the more common natural and decimal exponentiations? Computer Approximations lists many that are accurate over a variety of ranges. The most useful seem to be those that work between 0.0 to 0.5 and 0.0 to 1.0. Use a range-reduction approach much like I just described, substituting the appropriate base: 10x =10a X10(x–a ) or ex =ea Xe (x–a ) .Using the same ratio of polynomials in Equation 3 the coefficients in this next table give 10x to 12.33 digits.
P00 (+ 2) +0.41437 43559 42044 8307P01 (+ 1) +0.60946 20870 43507 08P02 (- 1) +0.76330 97638 32166Q00 (+ 2) +0.35992 09924 57256 1042Q01 (+ 2) +0.21195 92399 59794 679Q02 (+ 1) +0.1
The downside of using this or similar polynomials is you've got to solve for that pesky 10a . Though a look-up table is one fast possibility if the input stays in a reasonable range, there aren't many other attractive options. If you're working with an antique IBM 1602 BCD machine perhaps it's possible to left shift by factors of 10 in a way analogous to that I've outlined for exponents of two. In today's binary world there are no left-shift decimal instructions so we must multiply by 10. Instead, combine one of the following relations:

or:

with the power-of-two approximations as shown in Listing 3.
Listing 3: Computing 10x using a 2x approximation
/**************************************************** expd_based_on_expb1063 - compute 10**x to 9.85 digits accuracy** This is one approach to computing 10**x. Note that:* 10**x = 2** (x / log_base_10(2)), so* 10**x = expb(x/log_base_10(2))****************************************************/double expd_based_on_expb1063(double arg){ const double log10_2 = 0.30102999566398119521; return(expb1063(arg/log10_2));}
Logarithms
There's little magic to logs, other than a different range-reduction algorithm. Again, it's easiest to develop code that works with logarithms of base 2. Note that:

Listing 4: Log range-reduction code
/***************************************************** reduce_log2 - The log2 routines require an input argument in* the range[0.5, 1.0].This routine reduces the argument to that range.** Return values:* - "arg", which is the input argument reduced to that range* - "n", which is n from the discussion below* - "adjustment", set to 1 if the input was < 0.5** How this range reduction code works:* If we pick an integer n such that x=f x 2**n, and 0.5<= f < 1,="" and="" *="" assume="" all="" "log"="" functions="" in="" this="" discussion="" means="" log(base="" 2),="" then:="" *="" log(x)="log(f" x="" 2**n)="" *="log(f)" +="" log(2**n)="" *="log(f)" +="" n="" *="" *="" the="" ugly="" "for"="" loop="" shifts="" a="" one="" through="" two_to_n="" while="" shifting="" a="" long="" *="" version="" of="" the="" input="" argument="" right,="" repeating="" till="" the="" long="" *="" version="" is="" all="" zeroes.="" thus,="" two_to_n="" and="" the="" input="" argument="" have="" this="" relationship:="" *="" *="" two_to_n="" input="" argument="" n="" *="" 1="" 0.5="" to="" 1="" 0="" *="" 2="" 1="" to="" 2="" 1="" *="" 4="" 2="" to="" 4="" 2="" *="" 8="" 4="" to="" 8="" 3="" *="" etc.="" *="" *="" there's="" a="" special="" case="" for="" the="" argument="" being="" less="" than="" 0.5.="" *="" in="" this="" case="" we="" use="" the="" relation:="" *="" log(1/x)="log(1)" -log(x)="" *="-log(x)" *="" that="" is,="" we="" take="" the="" reciprocal="" of="" x="" (which="" will="" be="" bigger="" than="" 0.5)="" and="" solve="" *="" for="" the="" above.="" to="" tell="" the="" caller="" to="" do="" this,="" set="" "adjustment"="1." *="" ***************************************************/="" void="" reduce_log2(double="" *arg,="" double="" *n,="" int="" *adjustment)="" {="" long="" two_to_n;="" divisor="" we're="" looking="" for:="" 2^(2k)="" long="" l_arg;="" long="" (32="" bit)="" version="" of="" input="" *adjustment="0;" if(*arg<0.5)="" {="" if="" small="" arg="" use="" the="" reciprocal="" *arg="1.0/(*arg);" *adjustment="1;" }="" shift="" arg="" to="" zero="" while="" computing="" two_to_n="" as="" described="" above="" l_arg="(long)" *arg;="" l_arg="" is="" long="" version="" of="" input="" arg="" for(two_to_n="1," *n="0.0;" l_arg!="0;" l_arg>>="1," two_to_n<<="1," *n+="1.0);" *arg="*arg" (double)="" two_to_n;="" normalize="" input="" argument="" to="" [0.5,="" 1]="">
Listing 5: Log base 2 code good to 4.14 digits
/**************************************************** LOG2_2521 - compute log(base 2)(x) to 4.14 digits accuracy****************************************************/ double log2_2521(double arg){ const double P00 = -1.45326486; const double P01 = +0.951366714; const double P02 = +0.501994886; const double Q00 = +0.352143751; const double Q01 = +1.0; double n; // used to scale result double poly; // result from polynomial double answer; // The result int adjustment; // 1 if argument was < 0.5 // Return an error if the input is <=0 since log is not real at and below 0. if(arg<=0.0) { printf("nHoly smokes! %d is too durn small. Aborting.", arg); return 0; } reduce_log2(&arg, &n, &adjustment); // reduce input to [0.5, 1.0]// The format of the polynomial is P(x)/Q(x) poly= (P00 + arg * (P01 + arg * P02)) / (Q00 + Q01 * arg); // Now correct for the scaling factors if(adjustment)answer = -n - poly; else answer=poly + n; return(answer);};
Need better results? The following coefficients are good to 8.32 digits over the same range:
P00 (+1) -.20546 66719 51P01 (+1) -.88626 59939 1P02 (+1) +.61058 51990 15P03 (+1) +.48114 74609 89Q00 (+0) +.35355 34252 77Q01 (+1) +.45451 70876 29Q02 (+1) +.64278 42090 29Q03 (+1) +.1
Though Computer Approximations includes plenty of approximations for common and natural logs, it's probably more efficient to compute log2 and use the change of base formulae:

(7)
and:

(8)
So far I've talked about the approximations' precision imprecisely. What exactly does precision mean? Computer Approximations lists relative precision for roots and exponentials:

(9)
Logarithms are a different game altogether; the book uses absolute precision, defined by:

(10)
To figure decimal digits of precision use:

(11)
The good book
Computer Approximations includes polynomials for all of the standard and inverse trig functions and much more. See www.embedded.com/code or www.ganssle.com/app.htm for implementations of trig functions.
Converting the terse range reduction explanations to useful algorithms does take a bit of work, but the book is an invaluable treasure for people faced with creating approximations. The samples I've shown just touch a fraction of a percent of the approximations contained in the book.
It's pretty cool to crank parameters into these simple equations that use such bizarre-looking coefficients and to see astonishingly correct answers pop out every time.
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com.
Have you seen the book “An Atlas of Functions” by Spanier and Oldham? They also offer approximations, but fewer than “Computer Approximations”. It's a fun book, as they go.
– Fred Williams
Hart is a dangerous book. In my work with a math Ph.D'ed colleague on high speed algorithms for Map Projection, we started comapring Hart's algorithms to processor architecture features. Many of the algorithms in Hart like those given above are rational algorithms and as such have a floating point divide in them.
On most desktop and workstation processors do double precision floating point multiplies in one clock cycle while single or double precision floating point divides and interger divides normally take 32 cycles. Processor pipelines also often stall on the divide operation. Our research showed that in our application domain that one was usually better off using a long polynomial approximation rather than a rational approximation. This may not hold for some of the embedded processors with no FPU's but if you have FPU support, beware that some optimization based on rational functions may be pessimization.
A perusal of the cycle counts for your architecture's floating point operations may be enlightening.
– Cameron Kellough