 All Problems Are Simple - Embedded.com

# All Problems Are Simple

This month, Jack shows you how he made the world's best root finder just a little bit better. So here it is. Use it with confidence.

Last month, I hoped to give you my root cracker in a single column. Alas, 'twas not to be. As I slavishly followed the derivation in my 1994 columns (starting with “Roots by Iteration,” October, p. 13), I remembered the reason it took me three months to give it to you the first time: it's not possible to do the thing justice in one column. In fact, I realized that I probably should have stretched the original, three-month series to four.

In retrospect, I'm glad we didn't make it last month, because the stretch gave me time to correct a very arm-waving sort of argument in the original column, and also give you some performance data on the resulting function.

To refresh your memory, we're seeking a method that can solve for the value of x such that (1)

The method I use seems to have no name except the awkward “inverse parabolic interpolation.” It was mis-described in the IBM Scientific Subroutine Package as Mueller's method, but that method actually uses direct, rather than inverse, parabolic interpolation. To keep me from endlessly repeating the words “inverse parabolic interpolation,” I'm going to refer to the algorithm by its original, if misleading, IBM name: CMIT.

Both methods begin when the user calls it with two bounding values of x, which we called x0 and x2 . These points must already straddle the root. That is: (2)

If the function is both continuous and has a continuous first derivative, this condition assures that at least one root exists in the range x0 ..x2 .

Both methods generate a third point by bisection, and then fit a parabola through the three points. The only issue is the direction in which the parabola opens-vertically or horizontally.

In CMIT, the parabola is assumed to open horizontally. I prefer this approach because, well, because it works so nicely. In my original article, and also last month, I asserted that the direct method is double-valued in x, while the inverse method is not. Strictly speaking, this is true in the sense that the parabola that opens upward or downward must cross the x-axis twice, if it crosses at all. However, I now see that in the range that Equation 2 holds, there can be only one crossing in the range x0 ..x2 .

This is not to say, however, that the fitted parabola is always a good fit. I still like the CMIT approach because the side-opening parabola only crosses the x-axis once. We can then use a test on the root to decide if the fit is a good one or not.

Skipping over all the derivation this time, we found that the condition that the estimated root lie somewhere between x0 and x2 led to the condition: (3)

where: (4)

and:

IBM's CMIT, however, only tested v, and required:

v > 0 (5)

Way back when I first analyzed the algorithm, the reason for the apparent omission of u was not at all obvious, and took some serious derivation and thought to convince me that the assumption is, in fact, valid.

I began by noting four possible cases, and their implications:

Case 1: u > 0, v > 0 Okay

Case 2: u > 0, v < 0="" forbidden="">

Case 3: u < 0,="" v=""> 0 Forbidden

Case 4: u < 0,="" v="">< 0="" okay=""> Figure 1: A legal, but bad, interpolation

Then I substituted parameters b and g, defined in Equation 7, to get Figure 1. The hyperbolas shown divide the b-g plane into the four cases, two of which are acceptable, two of which are not. Case 3 is not visible because it represents a tiny, new-moon-shaped sliver where the curves on the right seem to touch.

There's one other subtlety in the derivation, which can easily slip by. It turns out that we find it easier to do the bookkeeping if we can always assume that the middle point, P1 = [x1 , y1 ], lies on one side or another of the x-axis, relative to the other two. We choose to require that y1 lies on the same side as y2 , and therefore opposite y0 . We can always maintain this relationship by merely relabelling the points P0 and P2 , if necessary (side effect: x2 is no longer necessarily to the right of x0 ). In short, we require: (6)

The definitions for b and g are: (7)

Equation 6 then requires that both b and g be negative, which restricts us to the lower-left quadrant of Figure 1.

Now comes the last bit of arm-waving. To justify Equation 5, I had to show that we get no usefulness out of using Case 4. I did that, both eight years ago and last month, by making the argument that, as we converge to a solution, the value of: (8)

which is one of the coefficients of the quadratic, trends to zero. The solid line in Figure 1 is the curve for C = 0. If we truly trend to that curve as we converge, any solution that starts out in Case 4 must cross over Case 2 anyhow, so we gain little from using it.

Eight years ago, I knew that the statement that C trended to zero was not obvious, but I was in a hurry and let it pass. One month ago, I sort of did the same. I set out to prove it, and found that I couldn't. So I let the statement ride again. Now that I've had more time to think about it, I can see why I couldn't easily prove the statement: it's false.

I must admit, it took me a long time to accept this fact. I kept staring at the equations, looking for a way to “prove” what I'd already stated. It didn't help.

Consider this: the constant, C, is the inner term of a quadratic in nested form: (9)

Unwrapping the nests gives the equivalent form: (10)

Forget expanding the polynomial and comparing terms; just look at the form. The coefficient B multiplies the linear term, so it must have the form of a slope. Indeed it does, though it's upside-down from our usual form: (11)

We can expect B to approach a constant value as we converge on the root.

In the same way, BC must be the quadratic term, which describes the curvature of the function. Since that term is constant for a parabola, we can expect BC, and therefore C, to also converge to a constant. End of story.

So C is not zero as I had claimed. What does that do to our convergence line of Figure 1? Does it make that wrong as well? Again, I stared at equations, manipulated them, and got all kinds of complex expressions that led nowhere.

I'm telling you this because I'm building up to a universal truth, which took me a lot of years to learn, and which is something for you to write down in large block letters, or perhaps chisel into stone: All Problems Are Easy.

When faced with a new problem, you have at least two choices. On the one hand, you can take the brute-force approach: dive in and start pounding it with every hammer-shaped tool in your toolbox. You can fill the air with equations and derivations, you can write great volumes of software, you can write volumes of specs and analyses, and you can occupy yourself for many years, if you so choose.

Or, you can sit down and think about the problem. Somewhere, lurking inside that huge, apparently monolithic, and seemingly overwhelming problem is a crack that leads directly to a simple solution. Think of the reactor vent duct in the Death Star, and you'll get the right image.

Personally, I always go for the second approach. Sometimes, it seems to take longer, because there may be long periods of apparent inactivity. But trust me, once you find that vent duct, the problem cracks wide open.

I gave this analogy to my major professor, circa 1964. He asked me, “But what are you going to do when you encounter a problem that doesn't have such a crack? What if it really is monolithic?” I replied, “I can't answer that question, because I've never found a problem that didn't.”

Thirty-seven years later, I still haven't. The current problem is a case in point.

The real proof

Once I saw the proof I needed, it was trivial. The easiest way to write the solution is to use the same slopes and things we used in the minimization problem. Let: (12) (13) (14) (15)

Each m has the form of a slope, and n the form of a slope of a slope, which is the same thing as a curvature. Thus, n captures the second derivative of the curve.

Now recall that the points P0 , P1 , and P2 are always equally spaced in x, because we derive P1 by bisection. Let the spacing between them be h. Then we can write: (16) (17) (18) (19)

Let's look at C using this new notation. From Equation 8, we can write: (20)

Since m, m1 , and n are all constants, reflecting either slope or curvature, we should put to rest, once and for all, the notion that C trends to zero.

What about our convergence line? This is also straightforward now. The parameters b and g are defined in Equation 7. Using them, Equations 16 and 18 give us: (21) (22)

The ratio of these two equations is: Which we can write: (23)

Now, here comes the sleight-of-hand. Because m, m0 , and/or m1 are so nearly equal, we must always be careful when subtracting them (see Equation 20). However, we can be quite sure that, in the limit as the interval x0 ..x2 gets smaller, these slopes all converge to the same value. Thus the ratio must converge to one. At this limit, Equation 23 becomes: or: (24)

Ironically, this is the same equation that we got by the assumption that C trended to zero. So the convergence line of Figure 1 doesn't change I was right to assume the solutions converge there; I just got there with the wrong reasoning.

How well does it work?

The proof of the pudding is in the eating. How well does this algorithm work on real problems? In my original 1994 article, I used the test function: (25) Figure 2: The first test case

The idea was to dream up a function as far removed from a parabola as possible. The function is plotted in Figure 2. As you can see, it also gives a pretty stringent test, since it has lots of wiggles in it over the full range.

Despite the clearly non-parabolic shape of f(x), the CMIT algorithm converged nicely, to a tolerance of 1.0 10-6 , in only six iterations. That's impressive, but it's only the beginning of the story.

The problem with the function of Equation 25 is that there's no analytical solution. When running tests, it's always better if you know what the answer is supposed to be.

So I tried a new function: (26) Figure 3: Another test case

The graph is shown in Figure 3. Admittedly, it looks more like a parabola, but it's still described by an infinite series. This function has a known solution: (27)

Setting a convergence criterion of 1.0 10-8 , I ran the test program. The results are shown in Table 1. Figure 4: The range of x0 ..x2 shrinks as we iterate

The values of x0 and x2 are plotted in Figure 4. This graph shows how the range is reduced as we iterate. The routine claimed to be converging in five iterations. Figure 5: Error in root estimate as we iterate

The error curve is shown in Figure 5. The performance of this algorithm is stunning, to say the least. Typically, we get an improvement of around four decimal digits per iteration.

Note that in Figure 4 the value of x2 pulls in more slowly than that of x0 . We've seen this kind of behavior before; it's normal. If the curve bent the other way, it would be x0 that lagged behind.

In this case, however, we don't really care if one limit moves faster than the other, or if the estimated root is close to one of the limits. In fact, for the root-finder (as opposed to the minimum-finder) we should prefer it that way; the closer one bound is to the true root, the better.

A mystery remains, however, in the sense that the error shown in Table 1 and Figure 5 is a whole lot smaller than the 10-8 that was the epsilon value I set. So why did the routine keep iterating?

When I looked at this, I found the only room for improvement I've found in this algorithm in 40 years of using it. The test for convergence was:

if (abs(xm – x0 ) < eps)="">

Typically, the value of xm is much, much closer to the true root than x0 , so the algorithm tends to take unneeded steps. To reduce this tendency, I changed the test by comparing xm to its previous value instead of to x0 :

if (abs(xm – xlast ) < eps)="">

I added, of course, the appropriate logic to set xlast . This cut the number of iterations from five to four.

Looking at Table 1, however, we see that even four iterations is overkill. To an accuracy much better than 10-8 , we could have stopped after three.

This tendency is inherent in such a rapidly converging algorithm. If each iteration is three to four orders of magnitude better than the previous one, comparing the last two estimates of the roots is always going to be pessimistic. The real cure is to know the algorithm, and use a looser error criterion, eps, than you really need, on the assumption that the algorithm is going to take at least one more iteration than it needs to. Despite this little quirk, which I think can't be cured (and doesn't need to be), the performance of the algorithm is truly impressive.

Just for kicks, I extended the range x0 ..x2 to 0..6. Because the exponential curve flattens so radically, becoming parallel to the x-axis, I expected this to give me a very poor initial parabola. It did indeed, and the change caused the algorithm to reject the first three parabolic estimates, just as it should have. Instead, it bisected twice before falling into its usual parabolic routine. This is certainly encouraging, and demonstrates that our sanity tests are working just fine.

Listing 1 The world's best root finder

`double iterate(double x0, double x2, double (*f)(double), double eps, int imax, nt & error){	double x1,y0,y1,y2,b,c,temp,y10,y20,y21,xm,ym;	double xmlast = x0;	error = NONE;	y0 = f(x0);	if (y0 ==0.0) return x0;	y2 = f(x2);	if (y2 ==0.0) return x2;	if(y2 * y0 > 0.0)	{		error = BAD_DATA;		return x0;	}	for (int i = 0; i < imax;="" i++)="" {="">1 = 0.5 * (x2 + x0);		y1 = f(x1);		if (y1 == 0.0) return x1;		if (abs (x1 - x0) < eps)="" return="">1;		if (y1 * y0 > 0.0)		{			temp = x0;			x0 = x2;			x2 = temp;			temp = y0;			y0 = y2;			y2 = temp;		}		y10 = y1 - y0;		y21 = y2 - y1;		y20 = y2 - y0;		if (y2 * y20 <2.0 *="">1 * y10)		{			x2 = x1;			y2 = y1;		}		else		{			b = (x1 - x0) / y10;  			c = (y10 -y21) / (y21 * y20); 			xm = x0 - b * y0 * (1.0-c * y1);			ym = f(xm);			if (ym == 0.0) return xm;			if (abs (xm - xmlast) < eps)="" {="" return="">m;			}			xmlast = xm;			if (ym * y0 < 0.0)="" {="">2 =x m;				y2 = ym;			}			else			{				x0 = xm;				y0 = ym;				x2 = x1;				y2 = y1;			}		}	}	error = NO_CONVERGENCE;	return x1;}`

The code for the root finder is shown in Listing 1 and available at ftp://ftp.embedded.com/pub/2002/05crens.txt. As I said at the beginning, it's The World's Best Root Finder. Use it with confidence.

Jack W. Crenshaw is a senior software engineer at Spectrum-Astro. He is the author of Math Toolkit for Real-Time Programming, from CMP Books. He holds a PhD in physics from Auburn University. Jack enjoys contact and can be e-mailed at .