The root finder verified - Embedded.com

The root finder verified

Jack claims he's reached the end of his search for the Holy Grail of derivations. Read on and decide for yourself.

For a few columns now, I've taken you on a journey to double-check—and possibly improve—the algorithm I call The World's Best Root Finder (TWBWF). I first ran across the algorithm in the IBM Scientific Subroutine Package (SSP, Ref. 1) circa 1968 where it was called RTMI.FOR. I've been using the algorithm ever since and have translated it many times to other languages. The algorithm combines steps of bisection and inverse parabolic interpolation.

Central to the algorithm is a test to decide whether to accept a parabolic interpolation step. IBM's documentation gave no clue as to the origin of the step, so I had to derive it. Thirty-five years ago, I got a derivation that made sense to me. A few months ago, I discovered that the derivation was all wet.

Understand, it was only the derivation that was wrong, not the criterion itself, nor the code. I corrected the derivation in my last column. I also hinted that I'd found a way to improve the algorithm even more. In the process, I retranslated the code from its original Fortran II. This month, I'll show you how the latest incarnation evolved.

Background
We've been over the concepts many times before, so I'll make this part of the discussion brief. Given a function ƒ(x) , we seek the value of x —the root—such that:

ƒ(x) =0    (1)

The algorithm performs an informed search to find the root. But we must give it a hint where to look. To do that, we give it two bounding points for which the signs of ƒ(x) are different. If these two points straddle the root, the function must cross the x -axis in the interval. RTMI is sure to find the place.

I'm posting a copy of IBM's Fortran code (don't worry, it's not copyrighted). While the algorithm is golden, the code is the worst kind of Fortran II spaghetti code, replete with arithmetic (three-way) IF s. I had rewritten it long ago into Fortran IV and then into Pascal, C, and C++. All my subsequent rewrites tended to take off from some convenient, structured version. This time, however, I wanted to be certain I hadn't changed the algorithm in some subtle way. So I went all the way back to the original Fortran II code and methodically changed it, being careful not to alter its functionality. The change took place in two phases: the first to convert the algorithm to C++, the second to improve it, if possible. I'll be leading you through both phases here.

Translation
If you've never hand-translated code from one language to another, you should give it a try. It's fun and challenging, much like working a crossword puzzle. I often see people searching for commercial translation programs to auto-translate code. Unless the code is hugely large, that's usually not necessary. Doing it by hand is not that hard. A good editor with support for macros and regular expressions can make short work of the changes (there's no substitute for global substitution).

Translating goto's is no problem in C++. True, their use is frowned upon in polite circles, but all C and C++ compilers support them. Simply replace every Fortran numeric label with an alphanumeric equivalent and change GO TO to goto . For example, the lines:

     IF(x.lt.0.0) GO TO 100
     …
100           continue

Might translate into:

     if (x < 0.0) goto L100;
     …
L100:

The arithmetic IF is more bothersome, since there is no one-line replacement for a three-way branch such as:

     IF(x) 10,20,30

About the best one can do is:

     if (x>0) goto L30;
     if (x == 0) goto L20;
L10:

Fortunately, in most uses of the arithmetic IF , two of the labels tend to be the same; the two-way branch we can handle. Thus:

     IF(x)10, 10,20

Translates to:

     if (x>0) goto L20;
L10:

With one major exception, the translation was straightforward, and both compiled and ran on the first try. Once I'd verified that it worked precisely like its Fortran predecessor, I had a golden standard to compare all other versions to.

Armed with the golden version, I could start cleaning up the code by replacing goto's with structured constructs. I ended up with 11 versions of rtmi.cpp, ranging from version zero through 10. For the benefit of those who want to use the pure algorithm sans Crenshaw's meddling, I'm also uploading rtmi_v10.cpp. No, I have no plans to upload the others.

Whole books have been written about how to implement unstructured constructs using structured code. I'm sure you know that it's not always possible; the set of structured programs is a subset of all programs. One construct in RTMI.FOR was particularly troublesome. Unfortunately, it was also central to the design. It's summarized in Listing 1.

Listing 1: The RTMI control structure

     I=0
     ; start iteration loop. I is loop counter
4   I=I+1
     ; start bisection loop. K is loop counter
        DO 13 K=I,IEND
           
          ; test criterion for interpolation here
          IF( .and. I.LE.IEND)GOTO 17
          IF()RETURN
          ; else repeat bisection
13 CONTINUE
         ; loop count exceeded. error return
17  
         IF()RETURN
    GO TO 4

At first glance, it's easy to see that we have two nested loops; one, a counted loop using I as the loop counter and, two, an inner, do-loop, using K . But look again. The loops are all tangled up together, with the test on I inside the K -loop. There are at least two tests for convergence (the full code has more), one inside each loop. Finally, note that the inner loop doesn't run IEND times, but only IEND – I + 1 . The original comments don't help. Just after label 13 , they say:

“No convergence after IEND iteration steps followed by IEND successive steps of bisection on steadily increasing function values at right bounds. Error return.”

Surely that can't be right. If we do IEND iteration steps (meaning interpolation steps), we're going to exit on the test for I . The number of times left for the bisection loop is zero.

Fact is, there's really only one loop. If the criterion for parabolic interpolation is satisfied, the algorithm always exits the inner loop. We only execute multiple steps within the DO-loop when the criterion for parabolic interpolation fails. Under the right conditions, when the parabolic interpolation is cooking, we end up alternating bisection and interpolation steps. Most of the time, we can expect multiple bisections only in the beginning, when the interpolation criterion often fails.

Because of the interpolation criterion, a step of bisection is not necessarily followed by a step of interpolation. But a step of interpolation is always followed by bisection; the algorithm can't work, otherwise. Except for some issues of counting, the true structure of the code is the simple structure shown in Listing 2.

Listing 2: The true structure?

for i = 1 to iend loop
   bisect
   if(criterion)
      interpolate
end loop

Is this a verbatim implementation of Listing 1? Actually, it's not. Listing 2 counts only the number of bisections, and stops when they're equal to iend . In Listing 1, the outer loop counts only steps of interpolation. It's only when interpolation is constantly failing that the code will max out on the inner loop.

Does it matter? I didn't think so before, and I still don't. After all, the iteration count is a rather arbitrary sort of thing anyway, there mostly to protect us from a runaway process. Whether the algorithm fails on too many iterations, too many interpolations, or too many function evaluations, it still fails, right? If it succeeds, both forms succeed equally well. While the decision definitely alters both the structure and behavior of the algorithm, it only affects the way we fail.

What about the test criterion?
RTMI uses a reasonableness test on the inverse parabolic interpolation. The test criterion, which serves to save us from using a crazy estimate of the root, is:

y3 (y3 -y1 )-2y2 (y2 -y1 )>0    (2)

You might have trouble recognizing it in the RTMI code, but it's there, in the following guise:

7     TOL=F-FL
       A=F*TOL
       A=A+A
       F(A-FR*(FR-FL))8,9,9

I leave it to you to satisfy yourself that the code matches Equation 2. Hint: their FL is our y1 , FR is y3 , and F is y2 . TOL is just an all-purpose temporary.

Improving things
There's one more subtle difference between all my implementations and RTMI.FOR: the test for convergence. As is usual for iterative techniques, the algorithm allows us to specify a tolerance, eps , which is a measure of desired accuracy. The tolerance tells us when to stop and declare victory. I frankly didn't like IBM's test, for reasons you'll soon see.

RTMI.FOR uses eps as a (mostly) relative rather than absolute value. The code fragment is shown in Listing 3.

Listing 3: The convergence criterion

        TOL=EPS
        A=ABS(XR)
        IF(A-1.)11,11,10
10  TOL=TOL*A
11  IF(ABS(XR-XL)-TOL)12,12,13
12  
13  CONTINUE

The first construct is one IBM used all through the SSP. The general idea is that we're going to compare the current error to some tolerance and halt when it's within bounds. We'd like that tolerance to be relative; that is, proportional to some reference value of x . That's so we don't have to tinker with eps for different problems. Statement 10 scales TOL by a reference value of x .

But what value of x should we use? IBM chose to use the current value of x (actually, XR ). But one must be careful here; what if that value happens to be zero? To avoid this problem, IBM only scales the tolerance if the reference value is large. In other words, their code might increase TOL , but never decrease it.

My problem is that, in the real world, x often has units, and the meaning of the size “one unit” depends a lot upon whether the units are light years or wavelengths of light. I have no problem with using a relative measure, but it should always be relative.

Is there a relative test that makes more sense? Actually, there is, and it's not hard to implement. We have a good measure of the size of x by looking at its initial range, which is the distance between the two initial points. We can used a relative tolerance based on this range and be confident that it will work regardless of units. Here's my two-line code change:

x_tol = abs(x3-x1) * eps;

if (abs(x2-x1) < x_tol) return x2;

Some readers have suggested that I also perform a similar test on y . Years ago, I considered this same question and decided that it wasn't needed. Here's why.

In the “end game,” when we are nearing convergence, both points x1 and x2 (though not necessarily x3 ) should have been drawn in quite close to the true root. The curve should look pretty much like a straight line with a non-zero slope. The only difference between testing x -values and y -values should be a matter of scale; the scaling factor is the slope of the curve.

In RTMI.FOR, IBM did use a test for y , but a very puny one. During initialization, they generate TOLF , an arbitrary absolute value:

TOLF=100.*EPS

This is always an absolute value, not a relative one. My previous comment, regarding light years versus wavelengths of light, still goes.

Nevertheless, I'm prepared to admit that a test on y might save us from a premature exit, before the curve straightens out. Testing y protects us from a curve that's severely “bent,” with a local slope much higher than the average value.

Since the local and global slopes must eventually agree, I'm content with using the slope between x1 and x2 as the scale factor. That slope is already computed in the parabolic interpolation; we only need to move it so that it's always available. The code is:

b = (x2-x1) / y21;
y_tol = x_tol / abs(b);
if (abs(y2-y1) < y_tol) return x2;

The biggie
Now we get to the last change I want to make. You might recall that, in my last column, I said I'd found a way to improve the criterion for interpolation.

The general idea of the reasonableness criterion is to make sure that the parabolic fit is a reasonable one. If the curve fit results in a parabola that's wildly different from the actual curve shape, we can expect that the guess for the root will also be wildly wrong. It certainly seems reasonable that if the parabolic fit comes up with an estimate that's outside the range, x1 ..x3 , we should reject it. In fact, because the algorithm keeps the root between x1 and x2, we should reject the estimate if it's outside that smaller range.

That criterion led to a long and detailed study—got wrong in 1968, corrected this year—that showed that the inequality of Equation 2 is sufficient to ensure that the root is in range.

However, in my many tests of the algorithm, I found certain cases where it seemed to take a few interpolation steps to make much progress. Once this algorithm gets its head to the barn, it really zooms down to the root, but sometimes it seemed to tinker around before that. This is true, even with the test of Equation 2.

In every case, I found that the cause had to do with the location of the vertex of the parabola. Ideally, we'd like all three test points to lie on the same leg of the parabola. However, Equation 2 is not enough to ensure this. It's easy to find cases where Equation 2 is satisfied, the estimated root is between x1 and x2 , and yet the vertex is still in the picture. In such a case, the next iteration will be an improvement, but not a dramatic one, and perhaps not even as good as another bisection step would give. This inspired me to seek yet another criterion, based on the location of the vertex.

We haven't talked about the vertex before, and space prohibits me from going into a lot of detail, but I can outline the math. All of our analysis is based on an assumed quadratic:

x (y )=x 1 +B (y-y 1 )[1+C (y-y 2 )]    (3)

An alternate formula is based on the vertex:

x (y )=xv +K (y-yv )2    (4)

Expand the two equations, compare terms of the powers of y , and you'll get expressions for xv , yv , and K . The key parameter is:

   (5)

The new criterion, which only took me 35 years to derive, is that this value must not lie between y1 and y2 . If it does, we're better off taking yet another step of bisection.

There's only one problem with this approach, which may be why IBM didn't use it: C can be zero. C is, in fact, the measure of the curvature of the function. Every test I've run with a decent, nonlinear function, works. But after all, shouldn't the root finder find the root for linear functions as well? Even the ridiculously simple case:

ƒ (x )=x    (6)

fails with this new criterion in place. And rightly so, because the vertex of a straight line is undefined.

Fortunately, the solution is simple enough. If C = 0, we don't need parabolic interpolation or anything remotely resembling it. We've got a straight line, so one step of linear interpolation nails the root. The root for a straight line is given by:

x =x 1by 1    (7)

I've made all the modifications to add the new criterion, including the test for C = 0. As it happens, the required changes are fairly extensive. The original RTMI.FOR had the reasonableness test early on, so as to avoid doing any unnecessary computations. In our case, those are the computations for B and C . But I now need them anyhow, to evaluate yv . The code works beautifully for all the cases I could throw at it.

The most sweeping change is that we no longer need Equation 2 at all! Remember, it was only a shorter way of ensuring that the estimated root lies in the range x1 ..x2 . After the changes I've made, it's cheaper just to compute the root, then test it.

The end is near
We've now analyzed TWBRF to death and made it even better. I hope I never have to come this way again. As expected, the code works fine and the algorithm remains my favorite. I'm uploading the latest version to the web site. (NOTE: THIS CODE IS CURRENTLY UNAVAILBLE AND WILL BE POSTED SHORTLY)

If I had any doubts about the status of the algorithm as “The World's Best,” all my doubts were allayed last week when I was using Matlab to solve a completely different problem. Of course, Matlab has its own root finder, called fzero . One of the things I love about Matlab is that I don't have to spend time developing algorithms such as RTMI. I've learned from experience that I can trust the Mathworks folks to pick me out the right tool for the job and to implement it at least as well as I can. When I'm using Matlab, I trust their tools.

But while debugging my software, I zigged when I should have zagged (pressed “step into” instead of “step over”) and found myself inside fzero . Oh, well, I thought, while I'm here, I might as well look around. I found some interesting things, including comments that mentioned bisection and inverse parabolic interpolation. Then I saw the telltale sign: an implementation of Equation 2. Matlab's fzero is IBM's RTMI! If that doesn't clinch the world's title, I don't know what does.

Jack Crenshaw is a senior software engineer at Spectrum-Astro and the author of Math Toolkit for Real-Time Programming , from CMP Books. He holds a PhD in physics from Auburn University. E-mail him at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.