A root finder's roots - Embedded.com

A root finder’s roots

Even the most robust root-finding algorithm has an Achilles' heel. All it takes is one pathological function to expose it.

Dang! With all my talk about explaining things to you in ways that are absolutely transparent, you'd think I'd get one thing understood myself: for explanations to be understood, they must first be correct. Four months ago, I showed the power series for the log function.[1] Thinking I would make things simpler, I showed the form of the series without the factor of two that goes in front. That confused a lot of readers who rightly couldn't understand why a mysterious factor of two showed up in the bilinear transform. Two months ago, I gave a correction, and danged if I didn't put the power of two in the wrong place.[2] Phooey!

Let's straighten things out. The power series for ln(z ) is given in any handbook by:


Truncating this series to the first term gives:


Recall that I was using this to unravel the “Rosetta Stone” equation:


And revert it to give:


Substituting Equation 2 into Equation 4 gives:


See how easy it is? It only took me four months to get it right. Sorry about that. Many thanks for the readers who sent me e-mails, especially Chuck Roberts and Roland Brabant. Thanks for keeping me on track, guys.

Better than best?
Until recently, I had intended to make this column a continuation of the z -transform stuff and other topics that flow from Equation 3. Recently, however, something else has come up that I think is more important and timely. Trust me, we'll be talking about digital signal processing for many months to come, so perhaps a brief digression won't be such a bad thing.

A while back, I wrote up a routine that I modestly called The World's Best Root Finder (TWBRF).[3] I had originally written about this routine way back in 1994.[4,5,6] But since I had mentioned it a few times during our discussion of minimization, I felt I had to share it again. I repeated an extensive analysis of the math, but I'd like to address it now in a higher-level context.

The math is hairy, and I spent a full two months walking you through it in laborious detail. In retrospect, I think it might have been better to explain the concepts behind the algorithm and leave a few details out of the math.

Recently, the algorithm has bubbled up to the surface again, both at my day job and with readers. For that reason, I've been analyzing it in even more detail than I ever did before (which is saying a lot). At one time it appeared the algorithm might have a bug or two lurking in it, so I've just completed a detailed analysis and walkthrough. I'll be giving the results here.

The good news is that the algorithm does indeed work as advertised, with a caveat that I'll get to later. But—wonder of wonders—in my tests and analyses I've discovered at least one and possibly two ways to make the World's Best Root Finder even better. When you consider that I've been using the algorithm for no less than 34 years and that it had a long and honorable history before that, you'll appreciate this as a rather remarkable finding. Follow along with me this month, as I describe where the algorithm came from and what I've discovered.

First, let me describe the problem TWBRF is supposed to solve. We're given a nonlinear function of a scalar variable:


And we want to find a root of the function, meaning the value of x for which:


As is usual with such methods, we require at least one starting value as a guess. TWBRF requires two such values, x0 and x2 , that already straddle the root. That is to say, the signs of f(x0 ) and f(x2 ) are different. Mathematically:


What if the function is zero at either point? Strictly speaking, that condition violates the requirement of Equation 3, but we couldn't care less. We now know that at least one of the points is a root, so we simply declare victory.

Figure 1: The bisection method

The situation is shown in Figure 1. Here f(x0 ) is positive, and f(x2 ) is negative, though the algorithm doesn't care which is which. The true root is the point R where the curve crosses the x -axis. Of course, since we're using a numerical method, we typically don't find the exact value of x; that can only be assured if the problem has an analytical solution. Instead, we call ourselves successful if the value of x is determined within some specified tolerance, .

For the sake of completeness, I should mention that finding the root at which f(x) = 0 is equivalent to finding the value of x for any specified y, since we can always transform one problem into the other simply by adding an offset to y. Thus, the root-finder is a lot more general than it might first appear. It's equivalent to sliding the whole curve of Figure 1 up or down as needed.

TWBRF is guaranteed to find a solution providing two conditions are met:

  • The function is continuous, with no discontinuities or sudden changes.
  • The specified is not so small that numerical roundoff problems occur.

    If more than one root exists in the interval x0 x2 , TWBRF is guaranteed to find one of them—but without telling us which.

    I'm sure you can see, from the figure, why the algorithm can make such a guarantee. If y0 = f(x0 ) is on one side of the x -axis, y2 = f(x2 ) is on the other, and the curve of f(x) connects them, there has to be some point where the curve crosses the x -axis. As long as we always keep two points straddling the root, we're bound to find it eventually.

    The algorithm
    No doubt we can conceive of many ways to locate the root. Perhaps the simplest would be to bisect the interval:


    The y -component of the new point can only have one of three values:

  • Zero, in which case it's a root
  • The same sign as f(x0 ), in which case x1 and x2 now straddle the root
  • The same sign as f(x2 ), in which case x0 and x1 now straddle it

    In the last two cases, we just throw out one point, relabel it, and press on. At each iteration, our interval gets half as large, so eventually it cannot help but be smaller than .

    The problem with bisection is that it's sure but slow. The interval size is reduced by a factor of 1,000 every 10 iterations. If we set to one millionth of the initial interval, we'd need 20 iterations to get there, and that's for an answer good to only six digits.

    Another common approach is the secant method, which involves doing a linear interpolation between the bounding points. The equation of the straight line is:


    Setting y = 0 gives an estimate for x:


    Figure 2: The secant method

    The secant method is also guaranteed to converge, though convergence may be slow if the curve is badly shaped. As you should be able to see from Figure 2, the method is based on the assumption that the slope of the curve is close to the slope of the straight line connecting P0 with P2 . If that slope is not a good approximation at first, the estimated root is all wrong. And usually, as in the figure, one of the bounding points tends to be retained, so the algorithm can only inch down the range in a sawtooth motion. Typically, the secant method can only reduce the error by some fraction of 1.0 each iteration. This method can be even slower than bisection.

    Figure 3: Fitting a vertical parabola

    Figure 4: Fitting a horizontal parabola

    If bisection and linear interpolation are both too slow, the obvious next step is to try a second-order polynomial to fit the curve, instead of a straight line. A second-order polynomial has three coefficients, so we'll need three points to fit the curve. For the first step, at least, a point obtained by bisection is the obvious choice. Figures 3 and 4 show two possible curve fits. The first fits a parabola opening upward (or downward). The second, called inverse parabolic interpolation, fits a parabola opening to the side (as in Figure 4). Either way, the curvature of the parabola gives us a much better chance to accommodate the curvature of the function itself, and convergence is typically very fast. It gets even faster as we get closer to the root. As we “zoom in” on the root, the segment of the curve looks straighter and straighter, so that any curve fitting three points is virtually indistinguishable from the real function.

    Fitting an upward-opening parabola is called Mueller's method. It's the most popular method in common use. It's blazingly fast and usually works just fine. However, the method I prefer—and the one used in TWBRF—uses the inverse parabolic interpolation. To understand my reasons, we have to go back nearly 40 years.

    Ancient history
    In those days, we didn't have PCs or Matlab or Mathcad. No interactive tools at all. We could program in any language we wanted, as long as it was Fortran. No extensive math libraries, either. Even the commercial Fortran libraries wouldn't be along for another decade or so (and they were exorbitantly expensive). If we needed a math algorithm, we asked around to see if a colleague already had one. Failing that, we rolled our own. But that was soon to change.

    Around 1970, I came across a wonderful Fortran library distributed by IBM. It was called, appropriately enough, the IBM Scientific Subroutine Package (SSP), and was distributed free (I think) to IBM's customers. Though other user's groups and professional societies distributed their libraries (the Association for Computing Machinery was most notable and still is), I think the SSP might have been one of the first libraries offered by a computer vendor. Those of us who are old enough to remember the SSP remember it with fondness and cherish the manuals (which included both source listings and algorithm descriptions.

    To this day, I still use algorithms out of that old library, including the root finder, a polynomial root cracker, and an eigenvalue/eigenvector routine, among others. I recently resurrected the eigenvalue routine one more time. Of course, the code is long since obsolete; the worst kind of goto- filled, Fortran II spaghetti code. But the algorithms are mostly golden. After all, algebra hasn't changed much in the last 40 years.

    The root finder I use came from that library, and I've translated it over and over into Fortran IV, Fortran 66, Fortran 77, Pascal, C, C++, Matlab, and Mathcad. In my May 2002 column, I referred to it as “CMIT,” but chalk that one up to 30 years of fog.[1] IBM's name was RTMI, which stands for, I guess, “RooT-finder, Mueller Iteration.” CMIT was Jack's name for Jack's Own Version of RTMI (JOVR?). The “C” stands for Crenshaw (another modesty). CMIT (and its successors) differs from RTMI in several respects, mostly having to do with calling list options and conventions.

    IBM's RTMI acronym refers to Mueller's method, but that's incorrect. Mueller's method uses the vertical parabola of Figure 3; RTMI uses the inverse parabolic interpolation of Figure 4. I knew this, even way back then, but the error in description didn't stop me from using an excellent algorithm. A rose by any other name and all. My latest version of the iterator (05crens.txt) can be found in the 2002 code archive at www.embedded.com.

    Which is better?
    Since the root finder has to deal with arbitrary functions, it stands to reason that some functions are going to like the fit of vertical parabolas, others horizontal ones. There's no clear winner there. There is, however, an important difference in the way the three points for the fit are chosen. In Mueller's method, three previous data points, including the point at the estimated root, from a given iteration are retained for the next; only a bit of relabeling goes in between. In TWBRF, only the two points bracketing the root are retained; the middle one is obtained by another bisection.

    This is important. Because TWBRF always bisects before fitting, it requires two function evaluations per iteration (one, if the parabolic fit is not used). Mueller's method uses only one evaluation per iteration, so is potentially faster. However, it's my humble opinion that the cost in execution time is well worth it, in terms of reliability. The formulas giving the coefficients of the polynomial all involve ratios like:


    If two of the points are very close together, as can easily happen in Mueller's method, the precision of the result can be poor. In TWBRF, use of the midpoint guarantees maximum resolution for such computations.

    Back when I first posted TWBRF on www.embedded.com, I got e-mails pointing out this property: two function evaluations per step. My response was, and remains, that this is a price I'm willing to pay for “bulletproofedness.”

    But is it?
    Just about the time I was overflowing with smug, I had occasion to worry. First, for a recent problem, I translated the algorithm to both Matlab and Mathcad. In Mathcad, I thought I had seen a case where the new estimate of the root was outside the bounds of the original bracketing points. This should be impossible with anybody's method, so it was cause for concern.

    That concern was heightened by an e-mail from John Lull, who gave me a function that TWBRF failed to handle properly. Uh-oh, I thought. Maybe I made a mistake in translation. But when? Was it in 1969 or 2003? Was it in all translations or just the one posted on www.embedded.com ?

    That news prompted me to do a detailed analysis of the algorithm. Here's what I found. John's function is:


    According to John, TWBRF doesn't give an error message on this case; it “converges,” but too quickly and to the wrong value.

    Right off the bat, we can see a potential problem. The coefficient of the third-order term is tiny—too small to even be significant except in a double-precision world (which we're all in by now, right?). If we ignore that third order term, Equation 13 is a quadratic, with roots at x = 1 and x = 10/3. But with the third-order term included, the roots change a bit:


    Wow! Look at the new root! It's numerically huge! To make the problem even more challenging to the root-finder, John made the initial bounds very large. They're two-thirds of the second and third roots in Equation 14:


    These are perfectly valid bounds; there is only one root between them, the first root of Equation 14. A pure bisection algorithm would find this root—in about 140 iterations.

    Figure 5: A tough function to solve

    The problem is that TWBRF doesn't do just pure bisection. It tries to fit a parabola and thinks it's done so. To see why, look at its curve in Figure 5. Despite the tiny size of the third-order term, that term dominates over the range of interest. The curve has the typical s-shape of a pure cubic.

    Here's the intriguing part. The algorithm gets the third point by bisection. On the scales that we're talking about, the middle point is smack at the inflection point of the curve. The three points lie on the straight line connecting them, and the “parabola” has a curvature of zero. Because the range is so huge, the estimated root ends up as essentially equal to the previous one. After a couple of tries, the algorithm sees no change in its estimates and declares convergence.

    Is it broken?
    Short answer? Yes. I suppose one could argue—as many would—that John's function is pathological. Yes, the algorithm is guaranteed to work, but not when it's working at the limits of numerical precision. The largest value f(x) ever gets is 0.4, but it's got negative values of 1027 . It's asking a lot of any algorithm to deal with such a range of numbers. We can't blame the algorithm for limitations in the math chip or of floating-point roundoff error.

    I tried this argument on friend and colleague Jim Adams. It didn't work. Jim argues, rightly, that if the algorithm had failed to converge or given an error message, that would be okay. If John's test case violated some documented limits of the nature of test functions, that, too, would have been okay. But in this case, the algorithm, running a perfectly legal test case, thinks it's converged when it hasn't. That's broken.

    John points out that the convergence criterion tests only x, not f(x). For the record, the original IBM routine, RTMI, tested both. But I don't think its designers ever dreamed of numbers with such precision. I left out the test on f(x) because I found it to be rather puny. All it does is multiply by a fixed number (100) and use that as a criterion. For John's case, that would have been better than nothing. The routine would have failed to converge at all, signaling that something was wrong.

    But the real problem is that the routine shouldn't have tried to use parabolic interpolation at all. My tests have shown that if it had done even one additional step of bisection, it would have worked. So what is needed is a better criterion for deciding when to interpolate and when to just bisect. In general, we should be a lot more conservative here. Better to take extra bisection steps that aren't needed, than give the wrong answer!

    I never thought I'd see the day when, except for coding style, I'd find any SSP routine that could be improved upon. But it appears that I have.

    In my testing of the routine, I've also found other areas where I think the bisect-or-interpolate decision could be improved.

    This is somewhat amazing news. The algorithm was mature when I first met it and that was 34 years ago. I've analyzed it no less than three times—twice in the pages of Embedded Systems Programming alone. But I never saw the potential for improvement until now. Amazing.

    This routine, as it stands, is very, very good. For reasonable functions, it is virtually unbeatable; it converges quickly and to the right value. If you need an algorithm and you need it today, you can download my May 2002 code and use it with confidence.

    But it can still be improved. In my next column, I'll show you how.

    Jack Crenshaw is a senior software engineer at Spectrum-Astro and the author of Math Toolkit for Real-Time Programming, from CMP Books. He holds a PhD in physics from Auburn University. E-mail him at .


    1. Crenshaw, Jack. “Getting It, Part Deux,” Embedded Systems Programming, February 2003, p. 13.
    2. Crenshaw, Jack. “Get More Yet,” Embedded Systems Programming, April 2003, p. 11.
    3. Crenshaw, Jack. “All Problems Are Simple,” Embedded Systems Programming, May 2002, p. 7.
    4. Crenshaw, Jack. “Roots of a Function,” Embedded Systems Programming, August 1994, p. 13.
    5. Crenshaw, Jack. “None Other than e,” Embedded Systems Programming, September 1994, p. 13.
    6. Crenshaw, Jack. “Roots by Iteration,” Embedded Systems Programming, October 1994, p. 13.
  • Leave a Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.