In case you just arrived on the scene, we're trying to develop a function that locates the minimum of some math function:
(1)
y = ƒ(x)
Most algorithms that do this require, as a starting point, three points that comprise a "triplet," such that:
(2)
ƒ(x0) > ƒ(x1) < ƒ(x2)
The algorithm I've been showing starts by stepping along the function until it finds this triplet condition. Many readers have asked me why I do this; they see stepping as unnecessary and time-consuming. Here's why. The middle point is critically important to the process. The algorithms work by either bisecting segments or performing parabolic fits to the three points. If we don't have the triplet to begin with, the algorithms can't function.
It would be nice if we had a minimizer so robust that all we needed to tell it would be something like:
(3)
x = minimize(f); (3)
But that's not only asking an awful lot of the minimizer, it leaves us prey to too many functions with more than one minimum. We need to narrow the range.
The situation is entirely analogous to that of finding the root of a function (value for which ƒ(x) = 0). All known root finders start by interpolating between two points straddling zero. If we start with two such points, the solver knows that ƒ(x) must have a sign change somewhere in-between. So a call like:
x = findroot(f, x0, x1)
can be made to absolutely guarantee a solution, given reasonable properties for ƒ(x).
What do we do if we don't know good values or x0 and x1? we must do very much the same thing as in our minimizer; we must step along ƒ(x) until the sign changes.
Most root finders assume you can guess two starting values on your own, so they tend to leave the stepping part out. Often, the user can start with points sufficiently far away so that the sign of the function can be guaranteed. unfortunately, finding the starting triplet for a minimizer is not so easy.
The minimizer absolutely needs three values of x satisfying the conditions of Equation 2. What's more, three points satisfying this condition are considerably harder to find than two of opposite sign.
In short, the guts of the minimizer need three data points to get started. But how the heck do I get that middle starting point? Some implementers will offer only a function that requires the triplet as starting values, and in effect say, "you're on your own, kid...."
But for a general-purpose minimizer, this approach seems to be putting all the hard work off onto the programmer who calls it.
Others, such as Press et al in Numerical Recipes (Cambridge University Press, 1992), give the function that needs a starting triplet, but add another external function that performs a step process to kick the main function off. That way, you can call the triplet function if you know how, or the kick-starter if you don't. All these helper functions use the same stepping approach I do.
I've tried following the Numerical Recipes approach, but it seemed to me that for almost any real-world problem, I'm going to need the help of a stepper, or some other driver, to establish the triplet. In the end, I decided to include it in the minimizer proper.
Okay, but why the stepping? Surely there's a more efficient way. Actually, there isn't. At first, I tried simply bisecting the starting interval. That worked just dandy on my main test function:
(4)
ƒ(x) = cos(2πx3)
It crashed and burned on my very next test case:
(5)
g(x) = sin(2πx3)
After a few trials, I convinced myself that the only truly robust approach was to step along the interval until a triplet is found. That's why my algorithm has a stepper.
If you know for a fact that bisecting the x0..x2 interval will work, set n = 2 and you're home free, with very little overhead. If you're not sure where the minimum lies, make n large and you're sure to find it, albeit with more overhead.
Software engineering
I'd like to say something about software engineering. Most designs don't just evolve, they have to be planned and coaxed into existence. When I'm doing a design for software for my own use or for my company, I have a tendency to document my thought processes, just as though I were writing one of these columns. In my documents, I discuss the requirements for the project, my plan for a solution, and so on. Then I paste in copies of the code I generate, as I go. The document serves the function of an informal Unit Development Folder (UDF).
One thing I learned early on: designing is different than coding. It requires different thought processes. The best advise I can give is: when designing, get up and walk away from the computer. However, I don't usually follow the same practice when I'm writing software for this column, and I guess you can see why. When I'm writing a column, the column is the UDF. I tend to keep both the compiler and the document open together, and bounce back and forth between them. I don't write up separate documents because I have to explain to you, anyhow, what my plan is. I can't very well walk away from the computer, since I need it to tell you what I'm thinking. If I'm developing a certain module, you typically get to see the whole sequence, from the first stubbed-out module to the final, cleaned-up one. If you feel that this is stuff you'd rather not see please tell me, because I can assure you, it's a lot easier to show you the finished product alone. From the e-mail I get, though, most readers seem to enjoy watching the development process unfold.
Until this current effort, my scheme has worked pretty well, I think. When I was doing things like vector arithmetic modules, or digital filters, the code was simply not complex enough to warrant a more formal approach. I think it's safe to say that, in the case of this minimizer, I have hit the wall, in terms of the level of complexity that it's possible to process in this design-at-the-keyboard way.
So what shall we do about it? Actually, I've already done it. I realized that it's time to push away, get out of the chair, and walk to another room to do some thinking. And also to start writing separate analyses and specs. In short, run a UDF, separate from the text of this column. I've been doing a lot of that this month, and you'll see the results shortly. Frankly, I've been surprised at how many things got changed, and mostly simplified, by the simple process of walking away from the computer.
Plan Z
Enough talk, let's do. I've learned my lesson, done my analysis and my design homework. So where do we stand? Did I get anything out of that new effort?
You betcha. But maybe I should first mention the things that haven't changed. I'm still committed to a five-point data structure:
double x[5], y[5];
along with the functions to manage it. Every time we compute new data, it gets put into this structure, and every time we perform a computation, it's based upon it. This turns out to reduce data motion and parameter passing to a minimum, while still maintaining data integrity. I'm also committed to the state machine as the overall driver for the algorithm. Finally, as I've discussed, I'm committed to the idea of using a stepper function to locate the initial triplet. If you don't like that part, don't worry. It's easy enough to remove it, and pass three x-values to the minimizer. But in its latest incarnation, the stepper has a low enough overhead that I see no reason to separate it out as a separate driver program.
As I was doing my long-overdue analysis, one of my first shocks came when I realized that function step() could become a gutless wonder. Recall that, at one time, it was a state machine nested inside the larger state machine. Even at its simplest, I thought I'd need at least two states: one for riding up any possible increasing part of the function, the other for going down the decreasing part. To refresh your memory, this is because I wanted to be able to handle functions like the g(x) of Equation 5. This function increases before it starts decreasing to its minimum. A literal interpretation of "minimum" might say, "hey, as soon as I see the function increasing, I'm done," and return x = 0 as the minimum. I wanted something a little more robust, so I made the decision early on to skip over initial upwards motions.
No one was more shocked than I when I realized that step() can be reduced to the pseudocode:
repeat
take a step
until have_triplet()
Okay, maybe it's not quite that simple. We still must count the steps and decide what to do when we reach the end of the range. But the basic logic is just there. My version of function step() does work properly in the case of monotonically increasing or decreasing functions-something Brent's method can't quite do-but in most cases it just follows the pseudocode above. It's shown in Listing 1. I've made a few minor improvements. Mainly, they're there to avoid roundoff errors that can occur when adding a lot of little Dx's.
Listing 1: The step function
/*
* Step along the function, looking for a local minimum. To support functions
* that start off increasing, we allow for the possibility of an initial rise.
* This function works with the outer state machine, and returns the next
* state. The returned state depends upon whether or not an internal minimum
* (triplet) was found.
*/
STATE step(double (*f)(double), double xL, double xR, long steps)
{
assert(steps > 1);
// local variables
long n = 0; // step counter
double range = xR-xL;
double X, Y; // local data point
int flag;
for (n=0; n<=steps; n++)
{
X = xL + (double(n)/steps) * range;
Y = f(X);
flag = append(X, Y);
if (flag)
return HAVE_TRIPLET;
}
// we get here if function is monotonic
if (f(xR) < f(xL))
return RIGHT_EDGE;
else
return LEFT_EDGE;
}
Function step() returns a state to the state machine that tells it whether to start iterating or not. If step() gets to the end of the loop without finding a triplet, the function must be monotonic, either increasing or decreasing. We tell the state machine so, and let it return the correct value.
For the record, the awkward code involving variable flag is absolutely necessary, in Borland C++. Putting the call to append() inside the if-test gave a compiler bug.
Managing the data
You'll note that step() calls a new function, append(), which not only does all the work of managing the data structure, but also returns a Boolean to flag the triplet. I guess this calls for yet more explanation.
Once I decided to create the data structure, I always intended to treat it as much like a C++ object as possible, without actually making it one. Two integers, npoints and index, capture the current state of the array(s). As their names imply, npoints contains the current number of valid points in the array, while index points to the current minimum.
As I was organizing the functions that operate on this structure, I realized that I needed perhaps three levels of functions. The lowest level simply manipulates the data as commanded, without making any logical decisions. I've sprinkled a number of assertions through the code to protect me during the development phase, but otherwise the lowest-level routines don't do much logic.
My first plan was to make them do none at all. They would be completely dumb, doing whatever they were told, however bizarre. For example, the following function does nothing but stick a point into the specified location:
void replace(double X, double Y,
int k)
{
x[k] = X;
y[k] = Y;
}
In general, the lowest-tier functions don't return any data or otherwise interact with the caller. They just do what they're told.
To support shifting the data through the shift register, I wrote two low-level routines called push_left() and push_right(). The names pretty much tell the story. Each one shoves the data beginning at location k to one side or the other, then inserts the new data in its place.
In keeping with the low-level nature of these routines, my first inclination was to not have them muck around with the indices, npoints and index. I thought that would be better done at the next level up. The more I thought about it though, the less rational this seemed. The pointer, index, is supposed to point to the current minimum. It seems crazy to leave any operation with this thing pointing to the wrong place. So push_left() and push_right() ended up adjusting the pointers as necessary to keep things consistent. These two routines are shown in Listing 2.
Listing 2: The push routines
// shove data points left and insert a new one at k
void push_left(double X, double Y, int k){
assert((k >= 0) && (k < npoints));
// shift points to the left of k
for (int i = 0; i< k; i++)
{
x[i] = x[i+1];
y[i] = y[i+1];
}
// insert new point
x[k] = X;
y[k] = Y;
if (index <= k)
index = max(0, index-1);
}
// shove data points right and insert a new one at k
void push_right(double X, double Y, int k){
assert((k >= 0) && (k < npoints));
// shift points to the right of k
int maxi = min(npoints, 4);
for(int i = maxi; i>k; i--){
x[i] = x[i-1];
y[i] = y[i-1];
}
// insert new point
x[k] = X;
y[k] = Y;
npoints = maxi+1;
if (index >= k)
index = min(4, index+1);
}
Armed with push_left(), I had no problem writing append(). This function is the first of the second-tier functions. I made the decision to let functions at this level return a Boolean to signal its result. In this case, append() returns true when it finds a triplet.
There's an important point to make about this function, and I've been entirely too cavalier about it so far. You'll recall that I've been quite a stickler about managing the data points so that the condition of Equation 2 is maintained. If we ever lose this condition, all bets are off.
However, I wasn't so careful about that part during the stepping procedure. Though it's admittedly an unlikely occurrence, one can envision a pathological function that keeps a constant value for quite a time, before turning south. In such a case, we could conceivably end up with a set of five data points in the array, all with equal y-values.
The same kind of situation crops up later, when we begin inserting new points arising from estimations of the minimum. Again, it's very unlikely, but it's possible for the new point to be exactly equal to the adjacent one. That's unacceptable.