We often take wireless communications for granted, without considering the underlying complexity and technological barriers. It's worth taking a closer look at those aspects.
You may recall that Brent's method, which I was led to believe was the ultimate superlinear convergence method, uses a combination of Golden Section search and parabolic interpolation. All my effort in recent months has been aimed at making such a combination work. We discovered last month that not only does it not work (very well), it cannot work.Numerical Recipes says, “A typical ending configuration for Brent's method is that a and b [the two bracketing points] are 2 * x * tol apart, with x (the best abscissa) at the midpoint of a and b, and therefore fractionally accurate to [where tol is the maximum allowable error in x].”
I have never seen this happen, cannot make it happen, and am now prepared to state that it cannot happen. Recall that, in seeking the minimum of some function f(x), we must maintain a triplet of points such that:
By hook or crook, we attempt to shrink the interval x0 ..x2 , while also trying to guess the value of x1 that represents the true minimum. Nothing would be nicer than to end up with x1 not only at the minimum, but midway between x0 and x2 , as Numerical Recipes implies. It simply doesn't happen. What we see instead is an increasing imbalance in the three points, with the ratio:
either getting closer to zero, or closer to one.
From the get-go, my main beef with Brent's method is that it has a tendency to hang on to “outlier” points, leaving, say, x2 unchanged over many iterations. As I showed last month in my discussion of root-finding and the secant method, this tendency is hardly surprising. In general, one converges on a solution from one direction or another (as opposed to alternating directions). This leaves one of the bracketing points fixed while the other is pulled closer to the solution. In the case of my standard test function:
the preferred direction just happens to be from x2 towards x0 , leaving x2 unchanged.Since x1 is presumably not changing much (if it is, we're not converging) and since x0 <>1 , there is no possible way the interval x0 ..x2 can shrink appreciably until x2 changes. That's precisely the reason my code of two months ago took so long to converge. I was (deliberately) using a simple-minded convergence criterion, based only on this interval. My experience, and that of others who have tried the algorithm, has been that one of the end points, in my case x2 , is only changed when a bisection occurs.
By definition, a bisection can, at best, only reduce the interval by a factor of two. That is, its reduction is linear. On the other hand, x1 is presumably converging superlinearly, and therefore the imbalance ratio r, can only get worse and worse.
Both Brent and I have worried about this, and used the ratio to trigger bisections. But, from the discussion so far, we see that such bisections can only alleviate the problem, not fix it. The perfectly balanced triplet described by Recipes simply doesn't happen, in my experience.
As any recovering alcoholic can tell you, recognizing that one has a problem is the first step to fixing it. Considering the reputation and widespread use of Brent's method, it took a huge paradigm shift on my part to recognize that the combination of parabolic interpolation and bisection is Not a Good Idea. But once that recognition occurs, the obvious next question is, “What can I use instead?” As I outlined in my last column, and have proven out using a Mathcad model, a different approach seems to work very well indeed.
Since that last column, I've been thinking about this issue a lot, as you might imagine. Some months ago I pointed out that we never actually get to see the entire function, f(x), in all its glory. All we can ever see are discrete points on the curve, obtained by probing the region x0 ..x2 according to some strategy or strategies. Although the speed and reliability of convergence may vary radically, depending on our choice of strategy, we really don't have to justify or defend the way we probe the function. If we could reliably count on extra sensory perception, lighting votive candles, prayer, or any other esoteric means to nail the minimum, we would be justified in doing so (limited only by concern as to the final destination of our souls!). Our only purpose in applying some systematic strategy is to minimize the number of probings we must have.
What we have now learned is that bisection is not adequate for our needs. In its place, I came up with a strategy I call reflection (readers familiar with the Simplex method will recognize the term). Having used the parabolic interpolation to take a step, 3 x, in some direction, we either step further along that direction, or back in the opposite direction, depending on whether or not we improved the situation.
I know what you're thinking. You're thinking, “Aw, man, not back to stepping again.” We've already dealt with the idea of stepping along a function to find the initial triplet, and we needn't go there again.
This is different. Why is it different? Because the step 3 x is not fixed; it's determined and predicted by the parabolic interpolation. Because that interpolation leads to superlinear convergence, we can expect 3 x to decrease very rapidly in size. Further, by “overstepping,” if you will, we run at least a fighting chance of stepping beyond the true minimum.
This possibility is vitally important. Recall my comparison from last month between the secant method for root-finding, and parabolic interpolation. Near the solution, both methods tend to sneak up on the solution, always in one direction only. If one guess gets us, say, 95% of the way to the solution, the next one gets us 95% of what remains, and the next, 95% of that, and so on, the good news is that we are converging systematically and reliably to the true solution. The bad news is that we never actually get there; we can only approach it asymptotically. Worse yet, because we never overshoot, we never get a chance to move in that outlying end point.
Overstepping solves that problem. If we overshoot the minimum, we end up with a new triplet with that troublesome end point moved in. We have, in fact, the very condition that Numerical Recipes promised we'd have: three equally spaced points, with the true minimum close to the midpoint.
My new strategy can be summarized as follows:
- Use parabolic interpolation to estimate the true minimum.
- Move to that point, as a step Δx from the current x1 , and evaluate f(x).
- Did that step give us a better minimum? If so, take a second step of Δx .
- If not, take a step Δx from the previous x1 , in the opposite direction.
Either way, we will have moved, after the interpolation, not once but twice (and thereby burned two function evaluations in the process). The advantage is that we can almost be assured to end up radically reducing the x0 ..x2 interval. In short, we at least have a chance of keeping the triplet balanced.
It should go without saying that this new approach is not without pitfalls of its own. For example, what happens if, by some miracle, we find ourselves exactly at the predicted minimum, and the interpolator thinks this is correct? It will give us a Δx of zero, and we can hardly expect a second step of zero to change anything.
You might be tempted to shrug off this case, and say, “Well, if we're at the minimum, we've converged, so what's the big deal?” The big deal is that we cannot be sure that we have converged.
The more I use the test function of Equation 3, the more I'm inclined to pat myself on the back for having chosen one so diabolical. Simple as it may seem, it's been nice enough to bless me with virtually everything that can go wrong in a minimization. One such gotcha has been the tendency of Brent's method to hold onto x2 ; that tendency showed up very early using our test function. The other is the fact that, over the interval 0..1, the function happens to have the characteristic:
You may recall that the formula for the parabolic interpolation yields:
If y2 = y0 , it really doesn't matter what c is; m will be zero, and therefore:
In short, xmin will be the simple midpoint between x0 and x2 . Now, suppose that I had listened to you guys, and eschewed the scheme of stepping along the function to find a triplet. I'd start with two end points, x0 and x2 , and assume that a minimum is hiding somewhere in between.
Since we only have two points so far, we're going to need a third. What better way to get that one than to bisect the x0 ..x2 interval? It's the obvious choice, and in fact it's the way I started out, lo these many months ago. For functions that don't happen to satisfy Equation 4, that would be a fine choice, and we would soon have a parabolic interpolant, and be on our way.
Unfortunately, this approach doesn't work with the very function we're using for tests. Because of Equation 4, we find:
and we're stuck before we ever began.
What does this tell us? That the method of reflections doesn't work? Surely it does not; we haven't even gotten to that point yet.
Does it tell us that parabolic interpolation doesn't work? Again, surely not. It's the best weapon we have in our arsenal. All the example really tells us is that we must be on the lookout for special cases and qualify our results carefully. Under most conditions, we should never encounter pathological cases and may never find a false minimum. However, a production-level minimizer must handle all cases, pathological or not, which simply means that we need to sprinkle a few more reasonableness tests throughout our code than we might like to. We must also be careful not to assume, too soon, that convergence has occurred. Numerical Recipes notes that Brent was careful not to accept convergence too quickly, and in that he was quite correct.
Before moving on to changing the code, I should mention that this is one of the main reasons I keep hanging onto five samples of the function, rather than merely three. Using five gives me at least two other estimates of the minimum; one from the “two-line” approach (merely extending straight lines from left and right) and one by using the outer three of five points. If they all agree that I'm near the minimum, I'm more easily persuaded that it's true. Similarly, if I use more than one method to estimate the minimum, and they all agree that I'm near it, it makes me more secure that the minimum is a true one. One can envision some kind of pathological function that can fool one estimator for the minimum. Fooling three or four is much less likely.
The convergence criterion
It's time now to start updating the code. Recall that the last code you saw had a very simple-minded criterion for convergence, based on the size of the overall interval, x0 ..x2 . That criterion was, in fact, entirely too simple-minded, and the performance of the algorithm showed it. Albert Einstein once said something like, “Simplify as much as possible, but no more” (easy for him to say). It's clear that, in this case, I went too far.
Though simple-minded, that criterion wasn't a mistake. I knew at the time that it was simple-minded, and considered it only a temporary placeholder for a better one. In fact, I've used a better one before, involving both x and y. I'll now restore that setup, using another object-like structure like the one for the data array.
The x part is easy; we need only relocate and formalize it. Add the following lines in the global area:
/* “Object” to support convergence
*criterion. The x values are
** constant, initialized by
** minimize(). The y values
** change as new y's are computed.
// global, relative tolerance
static double eps;
// tolerance in x
static double xeps;
Remove the local definition of xeps in function minimize() , and just before the loop in minimize() begins, add:
// initialize state and parameters
eps = Eps;
xeps = (x1-x0)*sqrt(eps);
state = step(f, x0, x1, Steps);
(The last line was already there. I included it to get you oriented.)
The y part is a bit longer, because the y range changes dynamically as we encounter new values. The “object” data is:
// defining values for range
static double ymax = -1.0e20; static double ymin = 1.0e20;
static double yeps;
We should adjust this data every time we evaluate a new function f(x) . The easiest way to do so is to encapsulate the function call and the adjustment of the parameters. The function in Listing 1 does this.
Listing 1: The function to adjust y convergence
// Function to adjust y convergence values
double test_y(double (*f)(double), double x)
double Y = f(x);
if (Y < ymin)="">
ymin = Y;
if (Y > ymax)
ymax = Y;
yeps = (ymax – ymin) * eps;
By the time we come out of the function step() with the triplet x0 , x1, x2 , the range in y should be at least as large as the spread between y1 and the larger of y0 and y2 . For now, we'll declare convergence when the estimated y is less than yeps from y1 .
We have three functions that compute an estimated y, but only one is hooked up right now, so it's okay to put the test inside para_fit() . The code is simple enough, and it controls the state returned by the function for the next step in the iteration:
if (abs(Y-Yguess) <>
Previously, I had code in function para_fit() to test whether or not the latest point was a new minimum value. If it wasn't, I forced a bisection step. I've now decided that this caution is neither necessary nor desirable; I was being too conservative. We'll use similar code later, for a different purpose, but for now the algorithm will perform parabolic interpolations only.
The results are-whooeee!!!-convergence to one part in 108, in four iterations. Let's see you beat that, Dr. Brent.
That's four steps of parabolic interpolation, after the step process has located the triplet at 0.7, 0.8, and 0.9. Because I used a fairly fine mesh in the step process, it took an additional ten steps, for a grand total of 14.
Since many folks have criticized the use of the stepping process, let's reduce the number of intervals to three. The algorithm still converges, this time with only 11 function evaluations, total-four for the stepping, seven for interpolation. That total is eight fewer than I got using Brent's method, with the same error criterion.
Why not reduce the intervals to the minimum value, two? Because the algorithm, as it's currently configured, will crash and burn, thanks to the condition that was outlined in Equation 9. The plain bare fact of the matter is that we're currently driving without a seatbelt or crash helmet. As currently configured, the algorithm has no protection against ill-configured data points or poor interpolations. We are also not using bisection at all, nor two of the three possible interpolation methods. We are just fortunate that the algorithm doesn't happen to hit any of the pathological conditions.
The reflection gambit
Fixing the safety problems is fairly easy, but let's not get sidetracked with those fixes now. Remember that the thrust of this column is to include the new maneuver, which involves reflecting 3 x to force the bounding x-points inward, and make the algorithm let go of outliers.
What we have demonstrated here is simply that interpolation using only successive parabolic fits works-most of the time. We could have predicted that without making the run. But even though we are now getting fast convergence, the outlier problem remains. After convergence, according to our new criterion, to the correct value of x = 0.793701, the next point is 0.797126, a full 0.004 away. As I've noted before, the right-hand end point simply resists being pulled in. The new strategy is supposed to fix that.
Fortunately, we already have the algorithm that I previously called flip . It takes only a minor change to the logic to make flip support the new strategy as well.
Here's how the algorithm goes: function para_fit() and two other functions compute not only the best estimate of xmin , but also the corresponding y value. This estimated value, ymin , must necessarily be less than the current best value (unless xmin = x [index]). Of course, the true value may or may not be less.
We can think of the value xmin as “recommending” a step Δx from the current best minimum, which is x [index], to the new one. Regardless of whether Δx is positive or negative, we expect the slope of the curve at this point to be downward. If it is indeed downward-that is, if the new y is less than y [index], we are going in the right direction. Our new strategy says that we should take yet another step in that direction (assuming, of course, that it wouldn't cause us to violate the range of the search).
If, on the other hand, the new y is larger than y [index], the math is telling us our conjecture that the slope is downward was wrong, and we stepped in the wrong direction. The easy fix is to “flip,” which is to say, back up and step from x [index] in the opposite direction (again, respecting range constraints). We've already got the code for flip. I'll call the other manuever the “two-step.”
I should note that using the parabolic fit in this way absolutely guarantees that it will never be used twice in succession; the parabolic fit will alternate with steps of either flip() or twostep() . This means that we should expect to have to use more function evaluations than with pure parabolic interpolation. The upside is that we get a more reliable convergence, with an end condition in which the minimum is roughly centered in the x0 ..x2 interval.
With this new strategy, convergence takes 17 function evaluations-four in the step process, seven for parabolic interpolation, and six for flip/two-step steps. We've used six more evaluations than before (but still less than Brent's method). But you should see the difference in the configuration of the points. Previously, the range x0 ..x2 , after convergence, was 0.004. Now it's less than 0.000001-an improvement by a factor of more than 4,000! We could, in fact, now use the x-range in the error criterion, which I intend to do (supplementing the test on y) before we're done. This is something that we've proven to be unworkable using either pure parabolic interpolation, or a combination of parabolic interpolation and bisection.
Is it better?
At first glance, you might be thinking that 17 evaluations is more than 11, so adding the flip/step maneuvers seems to add function evaluations with no apparent improvement in accuracy. That's true, but remember that the pure-parabolic approach is far too dangerous for production use. Using it, we're running without seatbelts. Because it results in such radical imbalance between the triplet points, we run a very real risk of losing the minimum altogether, thanks to numerical roundoff error. The fact that we got away with it in one test case only proves that sometimes one can get lucky.
In any real production routine, we're going to absolutely have to keep the data points from coelescing. Brent knew that also. His algorithm tests for such conditions, and forces bisection steps to keep the problem under control. In Brent's method, we can expect to get a few bisection steps before convergence. That runs the function call count back up around 17, or maybe more (I got 19, in my tests of the method).
I am going to have to add similar tests to my code, just to be safe. In fact, I'm going to retain the bisection method, if for no other reason than to give us a trapdoor for the Equation 9 condition. However, in this case we shouldn't expect to see much, if any, effect in terms of increased function evaluations. The flip/step strategy naturally keeps the three triplet points balanced, so we are far less likely to need safety hatches.At this point, I consider that we've got an algorithm that works very nicely indeed. We still have to add all those safety gadgets I mentioned, plus I want to sneak back in the algorithms that use all five data points (that's why we retained them, remember). Nevertheless, I think we've finally got an algorithm that gives reliable convergence without the wild imbalance that we've been seeing up until now. In Listing 2, I've included a truncated version of the algorithm, with the safety belts and unused strategems snipped out, and the declarations and support functions left off. Though still not production quality, I think you will find that it works quite well for most problems. The code is posted at ftp://ftp.embedded.com/pub/2001/Mintest2.cpp. Feel free to exercise it and see for yourself.
Jack W. Crenshaw is vice president of Crenshaw Technologies in the Tampa Bay area, and a specialist in the application of advanced methods to the solution of real-world, embedded-systems problems. He is the author of Math Toolkit for Real-Time Programming, from CMP Books. He holds a Ph.D. in physics from Auburn University. Jack enjoys contact and can be reached via e-mail at .
Listing 2: A simplified version
// do a parabolic fit of the three points centered at index
STATE para_fit(double (*f)(double), double &dx)
assert(npoints >= 3);
assert((index > 0) && (index <>
double X, Y, Yguess;
double dx0 = x[index] – x[index-1];
double dx1 = x[index+1] – x[index];
double span = dx0 + dx1;
double m0 = (y[index] – y[index-1])/dx0 ;
double m1 = (y[index+1] – y[index])/dx1;
double m = (y[index+1] – y[index-1])/span;
double c = (m1 – m0)/span;
double oldy = y[index];
// compute the estimated minimum
X = (x[index-1] + x[index+1] – m/c)/2.0;
dx = X – x[index];
Yguess = y[index-1] – c*(X-x[index-1])*(X-x[index-1]);
Y = test_y(f, X);
// have we converged?
if (abs(Y-Yguess) <>
// otherwise reflect one way or the other
if (Y > oldy)
// Flip the step to the opposite side of index
STATE flip(double (*f)(double), double dx)
double X = x[index] – 2*dx;
double Y = test_y(f, X);
// Take another dx step in the same direction
STATE twostep(double (*f)(double), double dx)
double X = x[index] + dx;
double Y = test_y(f, X);
/* Find the minimum of a function using a second-order interpolatio
minimize(double (*f)(double), double x0 , double X1, long Steps, double Eps)
// local variables
// initialize state and parameters
eps = Eps;
xeps = (X1-x0 ) * sqrt(eps);
state = step(f, x0 , X1, Steps);
// outer loop of state machine
while (iter > 0)
return x0 ;
state = bisect(f);
state = para_fit(f, dx);
state = flip(f, dx);
state = twostep(f, dx);
cout < "minimize="" failed="" to="" converge"=""><>