 Worth the Wait - Embedded.com

# Worth the Wait

It's taken a long time and there's still some tweaking to do, but the basic minimizer is done. Hopefully, the result is worth the wait.

In case you just arrived on the scene, we're trying to develop a function that locates the minimum of some math function:

(1)

y = ƒ(x)

Most algorithms that do this require, as a starting point, three points that comprise a “triplet,” such that:

(2)

ƒ(x0 ) > ƒ(x1 ) < ƒ(x2 )

The algorithm I've been showing starts by stepping along the function until it finds this triplet condition. Many readers have asked me why I do this; they see stepping as unnecessary and time-consuming. Here's why. The middle point is critically important to the process. The algorithms work by either bisecting segments or performing parabolic fits to the three points. If we don't have the triplet to begin with, the algorithms can't function.

It would be nice if we had a minimizer so robust that all we needed to tell it would be something like:

(3)

x = minimize(f); (3)

But that's not only asking an awful lot of the minimizer, it leaves us prey to too many functions with more than one minimum. We need to narrow the range.

The situation is entirely analogous to that of finding the root of a function (value for which ƒ(x) = 0). All known root finders start by interpolating between two points straddling zero. If we start with two such points, the solver knows that ƒ(x) must have a sign change somewhere in-between. So a call like:

x = findroot(f, x0, x1)

can be made to absolutely guarantee a solution, given reasonable properties for ƒ(x) .

What do we do if we don't know good values or x0 and x1 ? we must do very much the same thing as in our minimizer; we must step along ƒ(x) until the sign changes.

Most root finders assume you can guess two starting values on your own, so they tend to leave the stepping part out. Often, the user can start with points sufficiently far away so that the sign of the function can be guaranteed. unfortunately, finding the starting triplet for a minimizer is not so easy.

The minimizer absolutely needs three values of x satisfying the conditions of Equation 2. What's more, three points satisfying this condition are considerably harder to find than two of opposite sign.

In short, the guts of the minimizer need three data points to get started. But how the heck do I get that middle starting point? Some implementers will offer only a function that requires the triplet as starting values, and in effect say, “you're on your own, kid….”

But for a general-purpose minimizer, this approach seems to be putting all the hard work off onto the programmer who calls it.

Others, such as Press et al in Numerical Recipes (Cambridge University Press, 1992), give the function that needs a starting triplet, but add another external function that performs a step process to kick the main function off. That way, you can call the triplet function if you know how, or the kick-starter if you don't. All these helper functions use the same stepping approach I do.

I've tried following the Numerical Recipes approach, but it seemed to me that for almost any real-world problem, I'm going to need the help of a stepper, or some other driver, to establish the triplet. In the end, I decided to include it in the minimizer proper.

Okay, but why the stepping? Surely there's a more efficient way. Actually, there isn't. At first, I tried simply bisecting the starting interval. That worked just dandy on my main test function:

(4)

ƒ(x) = cos(2πx3 )

It crashed and burned on my very next test case:

(5)

g (x ) = sin(2πx3 )

After a few trials, I convinced myself that the only truly robust approach was to step along the interval until a triplet is found. That's why my algorithm has a stepper.

If you know for a fact that bisecting the x0 ..x2 interval will work, set n = 2 and you're home free, with very little overhead. If you're not sure where the minimum lies, make n large and you're sure to find it, albeit with more overhead.

Software engineering

I'd like to say something about software engineering. Most designs don't just evolve, they have to be planned and coaxed into existence. When I'm doing a design for software for my own use or for my company, I have a tendency to document my thought processes, just as though I were writing one of these columns. In my documents, I discuss the requirements for the project, my plan for a solution, and so on. Then I paste in copies of the code I generate, as I go. The document serves the function of an informal Unit Development Folder (UDF).

One thing I learned early on: designing is different than coding. It requires different thought processes. The best advise I can give is: when designing, get up and walk away from the computer. However, I don't usually follow the same practice when I'm writing software for this column, and I guess you can see why. When I'm writing a column, the column is the UDF. I tend to keep both the compiler and the document open together, and bounce back and forth between them. I don't write up separate documents because I have to explain to you, anyhow, what my plan is. I can't very well walk away from the computer, since I need it to tell you what I'm thinking. If I'm developing a certain module, you typically get to see the whole sequence, from the first stubbed-out module to the final, cleaned-up one. If you feel that this is stuff you'd rather not see please tell me, because I can assure you, it's a lot easier to show you the finished product alone. From the e-mail I get, though, most readers seem to enjoy watching the development process unfold.

Until this current effort, my scheme has worked pretty well, I think. When I was doing things like vector arithmetic modules, or digital filters, the code was simply not complex enough to warrant a more formal approach. I think it's safe to say that, in the case of this minimizer, I have hit the wall, in terms of the level of complexity that it's possible to process in this design-at-the-keyboard way.

So what shall we do about it? Actually, I've already done it. I realized that it's time to push away, get out of the chair, and walk to another room to do some thinking. And also to start writing separate analyses and specs. In short, run a UDF, separate from the text of this column. I've been doing a lot of that this month, and you'll see the results shortly. Frankly, I've been surprised at how many things got changed, and mostly simplified, by the simple process of walking away from the computer.

Plan Z

Enough talk, let's do. I've learned my lesson, done my analysis and my design homework. So where do we stand? Did I get anything out of that new effort?

You betcha. But maybe I should first mention the things that haven't changed. I'm still committed to a five-point data structure:

double x, y;

along with the functions to manage it. Every time we compute new data, it gets put into this structure, and every time we perform a computation, it's based upon it. This turns out to reduce data motion and parameter passing to a minimum, while still maintaining data integrity. I'm also committed to the state machine as the overall driver for the algorithm. Finally, as I've discussed, I'm committed to the idea of using a stepper function to locate the initial triplet. If you don't like that part, don't worry. It's easy enough to remove it, and pass three x-values to the minimizer. But in its latest incarnation, the stepper has a low enough overhead that I see no reason to separate it out as a separate driver program.

As I was doing my long-overdue analysis, one of my first shocks came when I realized that function step() could become a gutless wonder. Recall that, at one time, it was a state machine nested inside the larger state machine. Even at its simplest, I thought I'd need at least two states: one for riding up any possible increasing part of the function, the other for going down the decreasing part. To refresh your memory, this is because I wanted to be able to handle functions like the g (x ) of Equation 5. This function increases before it starts decreasing to its minimum. A literal interpretation of “minimum” might say, “hey, as soon as I see the function increasing, I'm done,” and return x = 0 as the minimum. I wanted something a little more robust, so I made the decision early on to skip over initial upwards motions.

No one was more shocked than I when I realized that step() can be reduced to the pseudocode:

repeat
take a step
until have_triplet()

Okay, maybe it's not quite that simple. We still must count the steps and decide what to do when we reach the end of the range. But the basic logic is just there. My version of function step() does work properly in the case of monotonically increasing or decreasing functions-something Brent's method can't quite do-but in most cases it just follows the pseudocode above. It's shown in Listing 1. I've made a few minor improvements. Mainly, they're there to avoid roundoff errors that can occur when adding a lot of little Dx 's.

Listing 1: The step function

/*
* Step along the function, looking for a local minimum. To support functions
* that start off increasing, we allow for the possibility of an initial rise.
* This function works with the outer state machine, and returns the next
* state. The returned state depends upon whether or not an internal minimum
* (triplet) was found.
*/

STATE step(double (*f)(double), double xL, double xR, long steps)
{
assert(steps > 1);

// local variables
long n = 0; // step counter
double range = xR-xL;
double X, Y; // local data point
int flag;
for (n=0; n<=steps; n++)
{
X = xL + (double(n)/steps) * range;
Y = f(X);
flag = append(X, Y);
if (flag)
return HAVE_TRIPLET;
}
// we get here if function is monotonic
if (f(xR) < f(xL))
return RIGHT_EDGE;
else
return LEFT_EDGE;
}

Function step() returns a state to the state machine that tells it whether to start iterating or not. If step() gets to the end of the loop without finding a triplet, the function must be monotonic, either increasing or decreasing. We tell the state machine so, and let it return the correct value.

For the record, the awkward code involving variable flag is absolutely necessary, in Borland C++. Putting the call to append() inside the if-test gave a compiler bug.

Managing the data

You'll note that step() calls a new function, append() , which not only does all the work of managing the data structure, but also returns a Boolean to flag the triplet. I guess this calls for yet more explanation.

Once I decided to create the data structure, I always intended to treat it as much like a C++ object as possible, without actually making it one. Two integers, npoints and index , capture the current state of the array(s). As their names imply, npoints contains the current number of valid points in the array, while index points to the current minimum.

As I was organizing the functions that operate on this structure, I realized that I needed perhaps three levels of functions. The lowest level simply manipulates the data as commanded, without making any logical decisions. I've sprinkled a number of assertions through the code to protect me during the development phase, but otherwise the lowest-level routines don't do much logic.

My first plan was to make them do none at all. They would be completely dumb, doing whatever they were told, however bizarre. For example, the following function does nothing but stick a point into the specified location:

void replace(double X, double Y,
int k)
{
x[k] = X;
y[k] = Y;
}

In general, the lowest-tier functions don't return any data or otherwise interact with the caller. They just do what they're told.

To support shifting the data through the shift register, I wrote two low-level routines called push_left() and push_right() . The names pretty much tell the story. Each one shoves the data beginning at location k to one side or the other, then inserts the new data in its place.

In keeping with the low-level nature of these routines, my first inclination was to not have them muck around with the indices, npoints and index . I thought that would be better done at the next level up. The more I thought about it though, the less rational this seemed. The pointer, index , is supposed to point to the current minimum. It seems crazy to leave any operation with this thing pointing to the wrong place. So push_left() and push_right() ended up adjusting the pointers as necessary to keep things consistent. These two routines are shown in Listing 2.

Listing 2: The push routines

// shove data points left and insert a new one at k
void push_left(double X, double Y, int k){
assert((k >= 0) && (k < npoints));
// shift points to the left of k
for (int i = 0; i< k; i++)
{
x[i] = x[i+1];
y[i] = y[i+1];
}
// insert new point
x[k] = X;
y[k] = Y;
if (index <= k)
index = max(0, index-1);
}

// shove data points right and insert a new one at k
void push_right(double X, double Y, int k){
assert((k >= 0) && (k < npoints));
// shift points to the right of k
int maxi = min(npoints, 4);
for(int i = maxi; i>k; i–){
x[i] = x[i-1];
y[i] = y[i-1];
}
// insert new point
x[k] = X;
y[k] = Y;
npoints = maxi+1;
if (index >= k)
index = min(4, index+1);
}

Armed with push_left() , I had no problem writing append() . This function is the first of the second-tier functions. I made the decision to let functions at this level return a Boolean to signal its result. In this case, append() returns true when it finds a triplet.

There's an important point to make about this function, and I've been entirely too cavalier about it so far. You'll recall that I've been quite a stickler about managing the data points so that the condition of Equation 2 is maintained. If we ever lose this condition, all bets are off.

However, I wasn't so careful about that part during the stepping procedure. Though it's admittedly an unlikely occurrence, one can envision a pathological function that keeps a constant value for quite a time, before turning south. In such a case, we could conceivably end up with a set of five data points in the array, all with equal y-values.

The same kind of situation crops up later, when we begin inserting new points arising from estimations of the minimum. Again, it's very unlikely, but it's possible for the new point to be exactly equal to the adjacent one. That's unacceptable.

After some thought on the matter, I decided that the best recourse is to go ahead and accept the new point, but to overlay it on top of the previous one, rather than shifting it out. Thus, even during the stepping process, only unique y-values appear in the array. My choice to use, rather than ignore, the new data stems from the fact that it's likely to be closer to the minimum once it's found. I test for similar crazy cases throughout the algorithm, and I believe this makes it pretty doggoned robust. the function append() is shown in Listing 3.

Listing 3: append()

/*
* Insert a point at end of list. Shift left if list is full. Adjust pointers
* as needed. Return TRUE if last point forms a triplet.
*/
int append(double X, double Y)
{

// maybe y's are equal
// if so, just overlay old point
if ((npoints > 0) && (Y == y[npoints-1]))
{
replace(X, Y, npoints-1);
return FALSE;
}

// “mainstream” flow
if (npoints < 5)
replace(X, Y, npoints++);
else
push_left(X, Y, 4);

// manage index
if(npoints > 1)
if(Y < y[index])
index = npoints – 1;
if((npoints > 2) && (index == npoints-2))
if(Y > y[index])
return TRUE;

return FALSE;
}

Inserting

Once we get into the meat of the iterative process, we're going to be calculating new values of x and y , and these points need to be inserted in the proper place in the data structure. What's more, the new y could be a new minimum, so we must adjust the index accordingly. (You may recall that I previously had a separate data point, defined by xmin and ymin , to record this minimum. I no longer see a need for that data; the array holds the minimum nicely.)

Like append() , functions insert_left() and insert_right() test for equal values. They must also work regardless of how many data points are in the array (behavior is different between a full array and one that isn't).

Any way you cut it, the code for these routines is tricky, and it's also key. I freely admit that they've been a sticking spot for me for some time. The only saving grace is that we can be sure that we'll never insert except in the segment adjacent to the current minimum. We can be assured of that because I've established it as a condition. If the parabolic fit, for example, estimates a minimum outside of the center triplet, I simply ignore the point and force a bisection instead (this is the only case where new data is not accepted). The code for insertion is shown in Listing 4.

Listing 4: Insertion

/*     * Insert a new point to the left of x[index]
* Assume that the point lies in the segment.
* Return TRUE if the new Y is a new minimum.
*/
int insert_left(double X, double Y)
{
assert(npoints > 2);
assert((x[index-1]< X) && (X < x[index]));
assert((0 < index) && (index < npoints-1));         // don't insert if y-values are equal
if (Y == y[index-1])
{
replace(X, Y, index-1);
return FALSE;
}
if( Y == y[index])
{
replace(X, Y, index);
return FALSE;
}
last_min = y[index];

// see if buffer is full
if(npoints == 5)
{    if((index == 3)|| ((index == 2) && (Y > y[index])))
push_left(X, Y, index-1);
else
push_right(X, Y, index);
}
else
push_right(X, Y, index);

if(Y < y[index])
{
– -index;
return TRUE;
}

return FALSE;
}

/*
* Insert a new point to the right of x[index]
* Assume that the point lies in the segment.
* Return TRUE if the new Y is a new minimum.
*/

int insert_right(double X, double Y){
assert(npoints > 2);
assert((x[index]< X) && (X < x[index+1]));
assert((0 < index) && (index < npoints-1));         // don't insert if y-values are equal
if (Y == y[index])
{

replace(X, Y, index);
return FALSE;
}

if Y == y[index+1])
{
replace(X, Y, index+1);
return FALSE;
}
last_min = y[index];
// see if buffer is full
if (npoints == 5)
{    if((index == 1)|| ((index == 2) && (Y > y[index])))
push_right(X, Y, index+1);
else
push_left(X, Y, index);
}
else
push_left(X, Y, index);

if(Y < y[index])
{
++ index;
return TRUE;
}

return FALSE;
}

It took a lot of thinking to arrive at the logic involved. I challenge anyone to try to write these functions at the keyboard. They are definite candidates for serious analysis and optimization. However, I've wrung them out thoroughly, and have been unable to break them.

One pleasant feature of using the data array is that the two insertion routines need very little in the way of passed parameters. They always insert to the left or right of the current minimum.

When we're inserting new data, we're very much interested in whether or not we've found a new minimum. The insertion routines return a Boolean true when the point was, indeed, a new minimum. This relieves quite a load off the calling routines.

When we do a bisection, we know which side of the current minimum will be bisected. However, the parabolic and other interpolations can come out on either side. Fortunately, armed with insert_left() and insert_right() , it's easy to write a generic insertion routine. To save space, we're posting this function functions on ESP 's Web Site, along with those for bisection and interpolation.

Bisection

Last time you saw the bisection routines, they were radically different than their new guise. Before, I had two functions called bisect_left() and bisect_right() , but all they did was to compute the new x -value. The logic was kept down in the body of the state machine, and it was pretty busy. These functions have had a major overhaul, so the logic is now inside the functions, and the state machine only calls them.

Together with step() , these two functions represent top-tier functions. Instead of returning Boolean variables, they return new state values for the state machine. Thanks to all the help they get from the strong supporting cast, these functions are much simpler than before.

You may recall my concern-richly justified, as it turns out-in previous columns, with the problem of Brent's algorithm hanging onto old data points well away from the minimum. You might also recall that I've resigned myself to the fact that I can't avoid this effect completely, because parabolic interpolation converges faster than bisection, no matter what else we do. However, in an attempt to minimize it, function bisect() tests the width of the interval on either side of the current minimum, and bisects that one. It can't eliminate the problem entirely, but it helps.

Parabolic interpolation

Thanks to all the new subroutines (and the analysis that went into them), things are beginning to click pretty well. The parabolic interpolation routines (I've provided two) are now straightforward, quite unlike what I was faced with before. I also went ahead and generated the two-line fit and a function called flip() . This last function arose because I came up with a condition, in my tests, where I could see that the interpolation estimated a point on the wrong side of the current minimum. That occurrence shows up very apparently when you compare the expected y -value of the minimum with the actual value for the same x . My tests showed that we could speed convergence by probing across the minimum when this happened.

The estimation functions are unique in that they not only estimate a value for x at the minimum, but they also predict that expected y -value. It seems only reasonable that using this information should help us decide the next move more wisely. So far, however, I'm only computing the expected values, and not really using them for anything.

I did try doing the flip-trick by testing within function para_fit() . However, I didn't see that it made any difference in convergence. This is an area that bears looking into more carefully.

The driving function, the state machine, is shown in Listing 5. Other than moving logic from the case statement to the called functions, it's pretty much the same.

Listing 5: The state machine

/* Find the minimum of a function
* using a second-order interpolation
*/

double minimize(double (*f)(double), double X0, double X1, long Steps, double eps){

// local variables

long iter = 60;
double dx;
double xeps = sqrt(eps);

// qualify inputs
if(X1 <= X0){
cout << "minimize: Search range must be finite and positiven";
return 0;
}

if(Steps < 2){
cout << "minimize: Must have at least two intervalsn";
return 0;
}
state = step(f, X0, X1, Steps);

// outer loop of state machine
while(–iter > 0){
switch(state){
case LEFT_EDGE:
return X0;
case RIGHT_EDGE:
return X1;
case HAVE_TRIPLET:
k = 0;
state = PARA_FIT;
break;
case BISECT:
state = bisect(f);
break;
case PARA_FIT:
state = para_fit(f, dx);
break;
case FLIP:
state = flip(f, dx);
break;
case DONE:
return x[index];
}
show();
if((x[index+1] – x[index-1]) < xeps)
}
cout << "minimize failed to converge" << endl;
return x[index];
}

Results

For the first time, we now have a state-machine version of the minimizer, with all functions in place. Some bells and whistles remain to be added, and some tuning to be done, but for the most part, it's all there. Mostly, the tuning has to do with deciding to do the optimal thing based upon the result of the previous computation.

Does it work? You bet. Does it work well? Pretty well. Using bisection alone, and based on the error criterion used (1.0e-4 in x), I got convergence in 42 iterations. That's a lot of iterations, but about what's expected for bisection.

The first time I enabled parabolic interpolation, I got convergence in 36 iterations, which didn't seem to me to be much of an improvement. Looking at the results after each iteration, however, it was easy to see what the problem was: the algorithm had long since converged; my primitive convergence criterion simply hadn't realized it yet.

Recall that in previous issues, I had a grandiose scheme for a convergence criterion that involved testing not only both the x – and y -intervals, but also comparing predicted to actual y -values. I had the data structures for those criteria in a previous incarnation of the minimizer, but have left them out, for now, in the current one. They'll go in next month.

Again, looking at the step-by-step convergence, I can see that “Brent's problem” is still with us, in the sense that the algorithm still tends to hang onto widely dispersed values on the “inactive” side of the true minimum. This definitely slows convergence. I had hoped that the flip operation would help, as it did in my Mathcad simulations, but I saw no evidence of this. Perhaps there's still a bug hiding in my flip algorithm.

Ironically enough, counting on parabolic interpolation, rather than bisection, makes the problem worse, not better. No doubt that's because of the outlier-point problem. I got my best results by forcing the algorithm to alternate between bisection and parabolic fit. Using that approach, the algorithm converged in 26 function evaluations.

What's that you say? Brent's method only needed 19? Ah, yes, but you forget that this algorithm also includes the step function that gets everything started. That takes a full 10 function evaluations, leaving only 16 for iteration. We're three ahead of Brent already, and I'm just getting started. There is, as I said, still a lot of fine-tuning to go.

I don't know about you, but I'm greatly encouraged with the results. The algorithm does work right now, it's extremely robust, and its performance is already pretty darned competitive.

Jack W. Crenshaw is a senior principal design engineer at Alliant Tech Systems. Much of his early work was in the space program, and he has developed many analysis and real-time programs. He holds a PhD in physics from Auburn University. Jack enjoys contact and can be reached via e-mail at .