Moving Right Along -

Moving Right Along


To read original PDF of the print article, click here.

Programmer's Toolbox

October 2000

Moving Right Along

Jack W. Crenshaw

Jack reaches a temporary solution to the problem of minimization. Optimizations will come later.

When last seen, at the close of last month's episode, our hero was busily hammering away in his workshop, constructing software for his Zappo-Ray, portable anti-Brent function minimizer. His intention was to put together a set of snap-together modules that were both simple enough to read and understand, and effective enough to handle even tough extraterrestrial and pathological functions. The last module constructed was the stub version of the parabolic fit shown in Listing 1. (Incidentally, this listing corrects two errors that crept into the version presented last month. Bonus question for extra credit: what were the errors?)

Listing 1: Functions for parabolic fit
// estimate a minimum via parabolic fitdouble para_fit(double x0, double x1, double x2,		       double y0, double y1, double y2){  double m0 = (y1 - y0)/(x1 - x0);  double m1 = (y2 - y1)/(x2 - x1);  double c  = (m1 - m0)/(x2 - x0);  return (x0 + x1 - m0/c)/2.0;}// shrink an interval using parabolic fitvoid para_shrink(double &x0, double &x1, double &x2,               double &y0, double &y1, double &y2){  double x = para_fit(x0, x1, x2, y0, y1, y2);  double y = f(x);  if(x > x1){    swap(x0, x2);    swap(y0, y2);  }  if((y > y1) && (x2 > x0)){    x0 = x;    y0 = y;  }  else{    x2 = x1;    y2 = y1;    x1 = x;    y1 = y;  }  swap(x0, x2);  swap(y0, y2);}

Note the last two lines of para_shrink(), which swap the two endpoints of the bracketing region. These were put there to match the same two lines in gold_shrink() . They have to be there in gold_shrink() , to force it to bisect the regions on the left and right side on alternate calls. They don't really have to be there for para_shrink(), but it seemed a good idea to leave the output state in a condition similar to that which gold_shrink() would do.

At the close of last month's session, we plotted the three points being maintained by the minimizer, and noted the same bothersome effect I had observed earlier: the parabolic fit, if left unaided, tends to hang on to distant endpoints a lot longer than we would wish.

In the case of bisection methods, the algorithm has no choice but to pull those endpoints closer together. That's what bisection is all about. Either the new point is high enough to use as a new bracketing point, or it's low enough to bring the opposite point in. Either way, the endpoints move inexorably closer together. In fact, it's the distance between endpoints, rather than the stability of the midpoint, that we use as the criterion for deciding when we've converged. The method of parabolic fit doesn't work the same way, so twe can't guarantee that each endpoint will be alternately moved.

This tendency to hang onto distant points leads to two separate problems: First, the solution isn't very satisfactory because we're fitting a curve through points that are not well-conditioned (two points close together, third very far away). Second, we can't use the distance between bracketing points in our halt criterion. No doubt this problem is the reason Brent came up with a complex and inscrutable algorithm for deciding when the solution was good enough.

Our challenge this month is to come up with the logic to keep the parabolic fit under control, and to force the use of a bisection method to help ensure a stable and smooth descent to the minimum.

Where should we put this logic? Well, we can always put it in the driver function, external to function para_shrink() . I think it better to place it within para_shrink() itself. That is, we allow the function itself to decide whether the minimum it's recommending is acceptable, and have it return a Boolean depending on the result. To begin, define the logical variables:

#define FALSE 0#define TRUE (!FALSE)

(I've put these definitions in my standard library header, jmath.h). Then alter para_shrink() to return a Boolean. For starters, we can just force it to always return TRUE .

Alternating methods
We seem to have established that letting the parabolic method run open loop is not very satisfying, since it tends not to pull in the endpoints. The next obvious approach might be to alternate between parabolic and bisection methods. We can do that by simply putting a counter in para_shrink() , and having it return FALSE every other pass. Not surprisingly, such an approach is going to take more function evaluations, but let's not worry about that too much for now. We'll be thinking harder about optimizing after we better understand the behavior.

At the moment, the big question is: does this approach fix the parabolic fit's tendency to hang onto outlier points? Well, it might have, if I'd done it properly. In fact, though, it didn't, because I screwed up. The parabolic fit ended up giving me points that failed to satisfy our fundamental requirement, that the middle point is always lowest.

The problem is still in that pesky statement, which was:

if(y > y1)

last month, and in Listing 1:

if((y > y1) && (x2 > x0))

Turns out, neither test is correct. What I was trying to do was to swap P0 and P2 if x turned out to be between x1 and x2. That's because x is guaranteed to be between x0 and x1 for the golden section, and I wanted it to look the same way after a parabolic fit.

However, I forgot that, because of all the swapping, sometimes x0 is larger than x2, and sometimes not. The correct test is:

if((y > y1)*(x2 > x0) > 0)

(Recognizing that this can be a risky test if x0 and x2 are drawing closer together, as we hope they do).

So I can fix the test. Do we now get performance where the endpoints always move inward? No, we don't. Figure 1 shows the case where I alternate between golden and parabolic methods. In Figure 2, I've set the frequency to allow two parabolic fits, then a forced golden section. It doesn't seem to matter; the problem of endpoints stuck at the same values persists.

Click on image to enlarge.

Click on image to enlarge.

To swap or not to swap
We don't have to look very far to see the problem: if you examine the code for gold_shrink() , you'll see that it always does just one bisection, and that one is always between x0 and x1. In practice, the whole point of bisections is to pull both endpoints in, but someone got tricky and hoped to save a little code by only doing one bisection per pass. That someone was me. My only excuse is that I had a little encouragement from both Brent and the Numerical Recipes authors, who used the same approach.

That approach limits the number of function evaluations per pass through the outer loop to one, which may make it nice for counting function evaluations, but has little to offer otherwise. We save duplicating a few lines of code, which makes the routine that much shorter and more compact. On the down side, we have all those stupid swaps, which not only confuse the heck out of me (as my errors in the swap test fully attest), but eat CPU cycles as well. A side effect is that we can't be sure whether x is increasing to the left or the right.

In retrospect, I believe that someone (again, me; who else?) got entirely too tricky for his own good. Given a choice between tight and tricky, and large but simple, I'll take simple every time. So let's begin the process of simplification by writing gold_shrink() the way it should have been in the first place, and have it reduce both sides of the interval each call. We'll require, and assume, that:

x2 > x0

throughout the process (except when they're equal, of course, in which case we're done). The new version is shown in Listing 2.

Listing 2: Balanced version of golden section
void gold_shrink(double &x0, double &x1, double &x2,		        double &y0, double &y1, double &y2){  double x = gold(x0, x1);  double y = f(x);  if(y > y1){    x0 = x;    y0 = y;  }  else{    x2 = x1;    y2 = y1;    x1 = x;    y1 = y;  }  x = gold(x2, x1);  y = f(x);  if(y > y1){    x2 = x;    y2 = y;  }  else{    x0 = x1;    y0 = y1;    x1 = x;    y1 = y;  }}

Function para_shrink() can now be simplified a bit, since we know the order of the x's. We can also eliminate all those silly swaps. The new code for this function is shown in Listing 3. The corresponding performance is shown in Figure 3.

Click on image to enlarge.

Listing 3: Modified version of para_shrink
int para_shrink(double &x0, double &x1, double &x2,              double &y0, double &y1, double &y2){  double too_close;  #define MAX_PARA 4  static int count = MAX_PARA;  if(-count == 0){    count = MAX_PARA;    return FALSE;  }  double x = para_fit(x0, x1, x2, y0, y1, y2);  too_close = (x2 - x0)/20;  if(max(x-x0, x2-x) < too_close){    count = MAX_PARA;    return FALSE;  }  double y = f(x);  if(x < x1){    if(y > y1){      x0 = x;      y0 = y;    }    else{      x2 = x1;      y2 = y1;      x1 = x;      y1 = y;    }  }  else{    if(y > y1){      x2 = x;      y2 = y;    }    else{      x0 = x1;      y0 = y1;      x1 = x;      y1 = y;    }  }  return TRUE;}

Now that's more like it! True, we still have variables that seem to stick at one value. But look how dramatically the others are changing as we go. After only eight passes, all three errors are less than 0.001. Compare that to Figure 1, where it takes 12 passes to accomplish the same result, or Figure 2, which takes 10. Admittedly, we're now evaluating the function twice for each bisection pass, but even so, the difference is impressive. I don't mind a value getting stuck at, say, 1.0, if the other two are rapidly squeezing up next to it. That's the best we can really hope for in such a method.

Click on image to enlarge.

Click on image to enlarge.

One thing we still must check on is the suitability of the estimates coming out of the parabolic fit. If, for example, it gave a value for x1 that's, say, almost equal to x0, we can expect that the next parabolic fit will be poor. I've flagged this as a potential problem in the past, and Brent worried about it too. However, both Brent's original Algol code and Press et al.'s C translation only reject the estimate if it's very close (within an epsilon value on the order of the precision in the solution) of one of the bracketing points. I'm going to be much more aggressive on this one, and reject it if it's within 5% of the total current interval between x0 and x2. The structure of para_shrink() allows us to reject, simply by returning a false value for the return Boolean. The code fragment is:

  if(max(x-x0, x2-x) < too_close){    count = MAX_PARA;    return FALSE;  }

I'd also like to change the criterion for forcing a bisection step. Currently, we just count off parabolic steps, and then force a bisection whether the method needs it or not. For the test case, it has always been needed in the sense that one of the boundary points seems stuck. But this may not always be true. When the boundary points are not stuck, it's foolish to abandon a method that's working. A better method would be to keep a separate count for each boundary point, and kick out when one appears to be stuck. Listing 4 shows function para_shrink() after this change.

Listing 4: Counting each endpoint
int para_shrink(double &x0, double &x1, double &x2,              double &y0, double &y1, double &y2){  double too_close;  #define MAX_PARA 2  static int count0 = 0;  static int count2 = 0;  if(count0 == MAX_PARA){    count0 = 0;    return FALSE;  }  if(count2 == MAX_PARA){    count2 = 0;    return FALSE;  }  double x = para_fit(x0, x1, x2, y0, y1, y2);  too_close = (x2 - x0)/20;  if(max(x-x0, x2-x) < too_close){    return FALSE;  }  ++count0;  ++count2;  double y = f(x);  if(x < x1){    if(y > y1){      x0 = x;      y0 = y;      count0 = 0;    }    else{      x2 = x1;      y2 = y1;      x1 = x;      y1 = y;      count2 = 0;    }   	}   	else{      if(y > y1){        x2 = x;        y2 = y;        count2 = 0;      }      else{        x0 = x1;        y0 = y1;        x1 = x;        y1 = y;        count0 = 0;      }    }    return TRUE;

Knowing when to quit
I think this header is particularly apt, both for this month's session, and also for the study of minimizing a function. I'm ready to quit, aren't you?

At this point, we almost have a serviceable function, in that it takes a mixture of parabolic and golden section steps, and it chooses between the two based on reasonably intelligent, rather than pre-programmed, rules. The one thing still lacking to make the function production-quality is a good stopping criterion. Recall that, for bisection-only case, the criterion is simple: we stop when the interval between x0 and x2 is reduced sufficiently far. Thanks to the forced changes in both boundaries, we could still use this criterion, but in fact, the parabolic fits seem to be homing in considerably faster than that last boundary point gets pulled in. So we'd be taking a few too many steps.

I have what I think are some super ideas for a robust halting criterion, but they would require too many changes to our code to include them this month. Brent used the change, not in two successive parabolic estimates, but in the estimates two cycles apart. Press et al. suggest only that “the reason … seems essentially heuristic.” I'll go Brent one step better, and require that we have three parabolic estimates within an epsilon error. The code is collected into a general-purpose function in Listing 5. A sample driver program, complete with test function, is shown in Listing 6.

Listing 5: A general-purpose minimizer
double para_search(double (*f)(double), double &x0, double &x2, double Eps){  #define MAX_ITER 30  double eps = Eps * (abs(x2-x0));  // initial last values make sure we try at least twice  double last1 = 1e20;  double last2 = -1e20;  double x1 = gold(x0, x2);  double y0 = f(x0);  double y1 = f(x1);  double y2 = f(x2);  for(i=0; i< MAX_ITER; i++){    if(!para_shrink(x0, x1, x2, y0, y1, y2))         gold_shrink(x0, x1, x2, y0, y1, y2);    if((x2-x0) < eps)      break;    if(max(abs(x1-last1), abs(x1-last2)) < eps)      break;    last2 = last1;    last1 = x1;  }  if(i == MAX_ITER)  cout << "Para_search: No convergence after " << MAX_ITER << " trials." << endl;   	return x1;}
Listing 6: A test driver
#include <iostream.h>#include <stdlib.h>#include <math.h>#include <constant.h>#include <jmath.h>/* NOTE:  Files constant.* and jmath.* define pi, the Golden ratio,* and also define abs as a macro rather than a function.  If you* don't do this, better change calls to abs to fabs.*/double f(double x){    return cos(2*pi*x*x*x);}void main(void){  double eps = 1.0e-7;  double x0 = 0;  double x2 = 1;  double x = para_search(f, x0, x2, eps);  cout << x << endl;}

For your interest, the method converges to 1-7 accuracy in 17 iterations: five bisections and 12 parabolic fits. It uses a total of 25 function evaluations. I know, I know, that's more than Brent's method. That is no doubt because each golden section makes two evaluations, not just one. At this point, it's a price I'm prepared to pay to get rock-solid robustness.

I still believe we can improve on the algorithm some more. For example, I want to try my ideas for a better halting criterion. However, I won't be doing that next month. I want to give both you and I some time to wring out the algorithm. Have at it, try it on as many functions as you can, and see if you can break it. Meantime, I think we could all use a rest from Brent's method and its cousins, so next month I'll be starting a new topic.

See you then.

Jack W. Crenshaw a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He is the author of Math Toolkit for Real-time Programming (CMP Books, 2000). He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via e-mail at .


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.