Moving Right Along
To read original PDF of the print article, click here. Programmer's ToolboxOctober 2000 Moving Right Along
Jack W. Crenshaw


Jack reaches a temporary solution to the problem of minimization. Optimizations will come later.
When last seen, at the close of last month's episode, our hero was busily hammering away in his workshop, constructing software for his ZappoRay, portable antiBrent function minimizer. His intention was to put together a set of snaptogether modules that were both simple enough to read and understand, and effective enough to handle even tough extraterrestrial and pathological functions. The last module constructed was the stub version of the parabolic fit shown in Listing 1. (Incidentally, this listing corrects two errors that crept into the version presented last month. Bonus question for extra credit: what were the errors?)
Note the last two lines of para_shrink(), which swap the two endpoints of the bracketing region. These were put there to match the same two lines in gold_shrink() . They have to be there in gold_shrink() , to force it to bisect the regions on the left and right side on alternate calls. They don't really have to be there for para_shrink(), but it seemed a good idea to leave the output state in a condition similar to that which gold_shrink() would do. At the close of last month's session, we plotted the three points being maintained by the minimizer, and noted the same bothersome effect I had observed earlier: the parabolic fit, if left unaided, tends to hang on to distant endpoints a lot longer than we would wish. In the case of bisection methods, the algorithm has no choice but to pull those endpoints closer together. That's what bisection is all about. Either the new point is high enough to use as a new bracketing point, or it's low enough to bring the opposite point in. Either way, the endpoints move inexorably closer together. In fact, it's the distance between endpoints, rather than the stability of the midpoint, that we use as the criterion for deciding when we've converged. The method of parabolic fit doesn't work the same way, so twe can't guarantee that each endpoint will be alternately moved. This tendency to hang onto distant points leads to two separate problems: First, the solution isn't very satisfactory because we're fitting a curve through points that are not wellconditioned (two points close together, third very far away). Second, we can't use the distance between bracketing points in our halt criterion. No doubt this problem is the reason Brent came up with a complex and inscrutable algorithm for deciding when the solution was good enough. Our challenge this month is to come up with the logic to keep the parabolic fit under control, and to force the use of a bisection method to help ensure a stable and smooth descent to the minimum. Where should we put this logic? Well, we can always put it in the driver function, external to function para_shrink() . I think it better to place it within para_shrink() itself. That is, we allow the function itself to decide whether the minimum it's recommending is acceptable, and have it return a Boolean depending on the result. To begin, define the logical variables: #define FALSE 0#define TRUE (!FALSE) (I've put these definitions in my standard library header, jmath.h). Then alter para_shrink() to return a Boolean. For starters, we can just force it to always return TRUE . Alternating methods At the moment, the big question is: does this approach fix the parabolic fit's tendency to hang onto outlier points? Well, it might have, if I'd done it properly. In fact, though, it didn't, because I screwed up. The parabolic fit ended up giving me points that failed to satisfy our fundamental requirement, that the middle point is always lowest. The problem is still in that pesky statement, which was: if(y > y1) last month, and in Listing 1: if((y > y1) && (x2 > x0)) Turns out, neither test is correct. What I was trying to do was to swap P0 and P2 if x turned out to be between x1 and x2. That's because x is guaranteed to be between x0 and x1 for the golden section, and I wanted it to look the same way after a parabolic fit. However, I forgot that, because of all the swapping, sometimes x0 is larger than x2, and sometimes not. The correct test is: if((y > y1)*(x2 > x0) > 0) (Recognizing that this can be a risky test if x0 and x2 are drawing closer together, as we hope they do). So I can fix the test. Do we now get performance where the endpoints always move inward? No, we don't. Figure 1 shows the case where I alternate between golden and parabolic methods. In Figure 2, I've set the frequency to allow two parabolic fits, then a forced golden section. It doesn't seem to matter; the problem of endpoints stuck at the same values persists. Click on image to enlarge. Click on image to enlarge. To swap or not to swap That approach limits the number of function evaluations per pass through the outer loop to one, which may make it nice for counting function evaluations, but has little to offer otherwise. We save duplicating a few lines of code, which makes the routine that much shorter and more compact. On the down side, we have all those stupid swaps, which not only confuse the heck out of me (as my errors in the swap test fully attest), but eat CPU cycles as well. A side effect is that we can't be sure whether x is increasing to the left or the right. In retrospect, I believe that someone (again, me; who else?) got entirely too tricky for his own good. Given a choice between tight and tricky, and large but simple, I'll take simple every time. So let's begin the process of simplification by writing gold_shrink() the way it should have been in the first place, and have it reduce both sides of the interval each call. We'll require, and assume, that: x2 > x0 throughout the process (except when they're equal, of course, in which case we're done). The new version is shown in Listing 2.
Function para_shrink() can now be simplified a bit, since we know the order of the x's. We can also eliminate all those silly swaps. The new code for this function is shown in Listing 3. The corresponding performance is shown in Figure 3. Click on image to enlarge.
Now that's more like it! True, we still have variables that seem to stick at one value. But look how dramatically the others are changing as we go. After only eight passes, all three errors are less than 0.001. Compare that to Figure 1, where it takes 12 passes to accomplish the same result, or Figure 2, which takes 10. Admittedly, we're now evaluating the function twice for each bisection pass, but even so, the difference is impressive. I don't mind a value getting stuck at, say, 1.0, if the other two are rapidly squeezing up next to it. That's the best we can really hope for in such a method.
Click on image to enlarge. Click on image to enlarge. One thing we still must check on is the suitability of the estimates coming out of the parabolic fit. If, for example, it gave a value for x1 that's, say, almost equal to x0, we can expect that the next parabolic fit will be poor. I've flagged this as a potential problem in the past, and Brent worried about it too. However, both Brent's original Algol code and Press et al.'s C translation only reject the estimate if it's very close (within an epsilon value on the order of the precision in the solution) of one of the bracketing points. I'm going to be much more aggressive on this one, and reject it if it's within 5% of the total current interval between x0 and x2. The structure of para_shrink() allows us to reject, simply by returning a false value for the return Boolean. The code fragment is: if(max(xx0, x2x) < too_close){ count = MAX_PARA; return FALSE; } I'd also like to change the criterion for forcing a bisection step. Currently, we just count off parabolic steps, and then force a bisection whether the method needs it or not. For the test case, it has always been needed in the sense that one of the boundary points seems stuck. But this may not always be true. When the boundary points are not stuck, it's foolish to abandon a method that's working. A better method would be to keep a separate count for each boundary point, and kick out when one appears to be stuck. Listing 4 shows function para_shrink() after this change.
Knowing when to quit At this point, we almost have a serviceable function, in that it takes a mixture of parabolic and golden section steps, and it chooses between the two based on reasonably intelligent, rather than preprogrammed, rules. The one thing still lacking to make the function productionquality is a good stopping criterion. Recall that, for bisectiononly case, the criterion is simple: we stop when the interval between x0 and x2 is reduced sufficiently far. Thanks to the forced changes in both boundaries, we could still use this criterion, but in fact, the parabolic fits seem to be homing in considerably faster than that last boundary point gets pulled in. So we'd be taking a few too many steps. I have what I think are some super ideas for a robust halting criterion, but they would require too many changes to our code to include them this month. Brent used the change, not in two successive parabolic estimates, but in the estimates two cycles apart. Press et al. suggest only that “the reason … seems essentially heuristic.” I'll go Brent one step better, and require that we have three parabolic estimates within an epsilon error. The code is collected into a generalpurpose function in Listing 5. A sample driver program, complete with test function, is shown in Listing 6.
For your interest, the method converges to 17 accuracy in 17 iterations: five bisections and 12 parabolic fits. It uses a total of 25 function evaluations. I know, I know, that's more than Brent's method. That is no doubt because each golden section makes two evaluations, not just one. At this point, it's a price I'm prepared to pay to get rocksolid robustness. I still believe we can improve on the algorithm some more. For example, I want to try my ideas for a better halting criterion. However, I won't be doing that next month. I want to give both you and I some time to wring out the algorithm. Have at it, try it on as many functions as you can, and see if you can break it. Meantime, I think we could all use a rest from Brent's method and its cousins, so next month I'll be starting a new topic. See you then. Jack W. Crenshaw a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and realtime programs. He is the author of Math Toolkit for Realtime Programming (CMP Books, 2000). He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via email at . Figures
