More on Minimization
Jack Crenshaw
After a minimal hiatus, Dr. Crenshaw returns to continue his discussion of function minima and report on reader feedback about Excel navigation.
Sorry
I wasnt here for
last months column; I hope you didnt miss me too much. Actually, thats not true; I hope you
did
miss me, since that would mean I still have some value to the magazine. It would be more accurate to say that Im sorry if I left you hanging for a month in the midst of a discussion on seeking function minima, and I do apologize for it. Ive been writing this column now for nigh onto eight years, and I try hard to keep them coming, and to maintain continuity between monthly
issues. I havent missed many issues, but one is one too many. Sometimes, however, a schedule simply cant be met, and this was one of them.
One of the few advantages of missing an issue, other than the fact that I get to get a good nights sleep, is that I can get feedback on one column before its time to write the next one. Specifically, I got feedback on my comments regarding navigating around in Microsoft Excel.
Boy,
did
I get feedback! To say that Ive received e-mail
on the subject would be an understatement. To refresh your memory, I commented on how bothersome I found it to have to scroll way down a long Excel spreadsheet to highlight a certain block of cells. I wished out loud for a mechanism like Quattro Pro that allowed me to select a block by its bounding cell addresses.
The e-mail messages all gave me alternative approaches to scrolling. They tended to fall into three main groups. The first, and by far the most numerous, pointed out that one doesnt need
to hold the mouse button down while selecting the block. Instead, you can use shift-click to highlight the second corner cell, and that will highlight the whole block.
A typical message might have read: To highlight a block of cells, first click on the top left cell, then scroll to the bottom right, hold down the shift key, and click again.
Sorry, no cigar. The whole point of the comment was
not
to have to scroll, to highlight the block. Any solution with the word
scroll in it is no solution. You only gave me a way to avoid holding the mouse key down.
Interestingly, and perhaps significantly, it was the writers of these non-solutions that tended to be the nastiest, and took me to task the most earnestly for being dummy enough not to know how to use Excel effectively. In my defense, I can only say, if you folks use as many different tools as I have to, you probably dont know how to make them all sing either.
The second set of suggestions was far more helpful.
Several people pointed out that Excel does have a command called Go to in the Edit pull-down menu. What I didnt know was that this command can be used to highlight a block. To use it, first click on the top left cell in the normal manner. Then, click Edit/GoTo. When the destination prompt window pops up, first enter the address of the bottom corner of the range, then
before
pressing OK, hold down the shift key. This highlights the entire range.
I have to admit,
thats a neat trick. I suppose its even in the Excel manual somewhere. I can tell you with some confidence, it is
not
in the Excel help screens. Heres what Help has to say: A large range of cells: Click the first cell in the range, and then hold down SHIFT and click the last cell in the range. You can scroll to make the last cell visible.
Thats it. No mention of GoTos anywhere.
The third set of messages were few and far between but did seem to offer a true
solution. Ill let Jamie Cox speak for that group:
Excel is so feature-rich, I couldnt imagine that by this time any feature had been overlooked, and I was right. You can make a large selection using the Goto item in the Edit menu (shortcut Ctrl-G). You can type in something like A5:B500 and the range of cells is instantly selected. I had to go read the help to find this, but its there!
What you get out of application software is limited mostly by what you
expect. Expect a lot. Youll usually get it.
This is the key: the GoTo works for ranges as well as single cells. That is, then, by far the most straightforward solution. Thanks to all for bringing the technique to our attention. I say our advisedly since, judging by my inbox, the great majority of Excel users also dont know about the technique.
Also thanks, Jamie, for the good philosophical advice. Ill try to take it to heart. One thing Jamie didnt tell me,
though, is how to find that reference in the help files. I still cant find that one.
Scientist, et al.
As long as were on the subject of e-mail, I should also mention that I asked, some months ago, if anyone knew if Micromath was still alive and selling their curve-fit product, Scientist. They are indeed. Readers Wayne Smart, Alex McIlraith, and Scott Maley report their Web address:
www.micromath.com/scientist.html
.
Scott also adds, You might try two products offered by SPSS (
www.spss.com
): TableCurve 2D and TableCurve 3D. Reader Mark Bottomley writes, I have found that CurveExpert, a shareware tool has worked admirably for me. The cost is low, about $35 to register, and the interface is very simple (
www.ebicom.net/~dhyams/cvxpt.htm
). Allen Windorn concurs.
Carey E.
Bloodworth writes, Id like to mention a free little symbolic math package called MuPad lite. The lite version is available for free (for evaluation purposes, of course), while its big brother is a commercial product. (I havent seen the commercial version.)
A few other people have also pointed out alternatives to Matlab. Reader Steve Hicks from Oak Ridge National Lab (my old stomping grounds), recommends VisSim, from Visual Solutions. Robert E. Alvarez references a freeware
version of Matlab called Octave (
www.che.wisc.edu/octave/
).
Ted Truske mentions a program called Gauss, which is similar to Matlab in that it is a vector/matrix algebraic handler with interpreter-based programming and excellent graphics capabilities. Gauss is produced by Aptech Systems Inc. (
www.aptech.com
).
Reader Calvin Conrey writes, I did want to bring your attention to a cousin of MatLab called SciLab.
From what I understand, it is descended from the same original FORTRAN code as Matlab, but has been kept as free software by the efforts of a group in France. With the exception of add-ons like Simulink, it is very much a work alike to Matlab, and many Matlab m-files work with little or no modification. If youre on a budget or just learning the tools, SciLab is a great way to go. Its available for most Unix/Linux variants as well as Windows 95/98 (
www.rocq.inria.fr/scilab/scilab.html
).
Ive also been getting e-mail from Robert Ford of MathTools (
www.mathtools.com
). They make products that are compatible with Matlab. One is called Mideva, another Matcom. Robert has been extremely helpful and informative in offering to tell me more about their products. I blushingly admit that, lately, Ive been too busy to pay as much attention as the products deserve. But you need not be so limited.
Robert tells me that MathTools has recently started a series of Web forums for discussion of their products. You can reach them at
www.mathtools. com
.
Reader Gary Knott wrote to point out that his company, Civilized Software (
www.civilized.com
), has a Matlab-like product called MLAB. Ill be trying out a copy of that one soon.
Last but not least, Matt Calder wrote, I feel compelled to let you know about a
MATLAB alternative, SPLUS. SPLUS is a programming language/environment very similar to Matlab. It is made by Mathsoft [same company that sells Mathcad]. SPLUS is based on the S language originally created at Bell labs (usually attributed to John Chambers). SPLUS, like Matlab, is an honest-to-goodness programming language used from a command line or script file.
The language definitely has a statistical bias, with literally thousands of functions which inspire the same feeling of time to go back
to school in a statistician as Matlab does in an engineer. Unfortunately, SPLUS is about as strong in engineering-specific functionality as Matlab is in statistics (which is to say not very), but for general-purpose number crunching and graphics they are comparable.
Finally, Matt was among the several correspondents who pointed out a serious boo-boo in my last column. I had made a reference to the central limit theorem in discussing the existence of a root of a function between a
positive and negative value. Matt rightly pointed out that I got the wrong theorem; the central limit theorem has to do with the statistics of the sum of random variables. The theorem I had in mind was the mean value theorem. Sorry.
Great work, folks! Thanks for all the great information. Keep it coming.
On to minimization
When last seen, our hero was trying to devise a general method for finding the minimum of a function,
f(x)
.
Specifically, he was looking for the value of
x
for which
f(x)
is minimized.
To refresh your memory, we began with the most brute-force (and also the most reliable) of all minimizers, an exhaustive search of the specified space, to some degree of resolution. Any time we found a value of
f(x)
smaller than our previous best, we recorded the corresponding value
x
. This method, crude though it may be, does have one huge advantage, which is that it will distinguish between global and
local minima. Most other methods quit when they find any kind of minimum, so they will be satisfied with a local one.
Next, we tried to speed things up by reducing the range of our search. The idea is to search with a fairly large step size until we find what seems to be a minimum. Then we narrow our focus to a region near this point, and slowly refine our resolution until its acceptable (whatever that means). Finally, we recognized that once weve stepped past a minimum, its pointless
to go on (unless were hoping for an even better global one later), so we might as well kick out of the subroutine and declare victory.
The roundabout route
In my studies so far, Ive tended to divide the initial range up into a fairly small number of divisions, such as 10. The question as to how many divisions is one of many questions that remain to be answered before we move on. Well take the
questions one at a time, but first, Id like to inject a bit more rigor into the discussion than weve used so far. To do so, Im going to take the long way around, and first review how we find the root of a function. Thats because the two problems, though different, also have much in common, as do the methods used to solve them. Both the similarities and the differences will be instructive. I gave a detailed analysis of root-finding some years ago, which Ill summarize below
(send me e-mail if youd like to see the details
again).
1,2
Imagine that you have some function
g(x)
, and are seeking the value of
x
for which:
g(x) = 0
In the textbooks, youll find a whole plethora of methods for finding the root
x
, for which this equation is satisfied. Newtons method, the method of false position, and the secant method all come to
mind. Like so many things in textbooks, however, you will also find that for any of these methods to work very well, you must first know that a root exists and have a general idea where it is.
So how do we do that? One way is to have the root bracketed before we even begin. This is where we came in (or rather, went out) in my last column. If I have two values
x
0 and
x
1, such that:
g(x0)
<
0
g(x1) > 0
I can be pretty
darned sure that somewhere in between the two, the value of
g(x)
is equal to zero. How do I know this? Through the central, er, mean value theorem. Also through common sense. If the function
g(x)
is continuous and has a slope that is everywhere finite, how is it going to get from positive to negative values without passing through zero? Answer: it cant. It absolutely
must
pass through zero somewhere, as it crosses from positive to negative values. (Clearly, the same is true going from
negative to positive.) If we begin the process with
x0
and
x1
known, we can be pretty confident that we will find a root. The only challenges to the root-finding function are to find it quickly, and not screw up and lose the root somewhere along the way.
One simple method for doing this is very much akin to our search for minima. Its called the method of bisection, and it means pretty much what the name implies: we simply divide the region in half, and generate a third point (see
Figure 1
):

When we evaluate
g(x)
, we can get one of three possible outcomes:
g(x) = 0
. In this case, we have found the root and are finished
g(x) < 0
. In this case, we know that the root must be between
x
and
x1
. We replace
x0
with
x
, and we will have moved closer to a solution
g(x) > 0
. In this case (the one shown), we know that the root
must be between
x0
and
x
. We replace
x1
with
x
, and we will have moved closer to a solution
Either way, we cannot fail to improve our position. The method is slow but sure; each bisection takes us inexorably closer to the root.
Note that for this approach to work, we must always start with the root already bracketed. The method does not usually work otherwise. The obvious next question is, how do we get the root bracketed in the first place?
There are two possible ways.
First, if we know the nature of the function, we can start with a wide region, that is almost certain to give us bracketing values. That approach would work just fine for a function as well behaved as the one in
Figure 1
, but it may not for more tricky functions. In fact, the function may contain multiple roots. If, for example, its generated from a high-order polynomial, it could indeed have many roots.
In such cases, we must do some kind of initial search procedure,
which starts at some initial value of
x
, and begins stepping along the function with some specified step size. The procedure will do so until it detects either a zero (root), or a sign change. Once we get the sign change, we have the root bracketed, and can go into our bisection algorithm (or some other, perhaps faster one, such as the method of false position or the super-fast, second-order method I showed in Reference 2).
If there are problems to be encountered in seeking a root, you can be
sure theyll appear in the stepping phase. Everything depends on choosing the right step size. If its too small, and the function changes very slowly, we could burn a lot of computer time seeking the bracketing region. On the other hand, if the step is too large, we might step right past one or more roots (almost certainly an even number) without even noticing them. Usually, we use a fixed step size and hope its small enough. That hope is usually tempered by some advanced knowledge of the
nature of the function (for example, if we know that there could be places where the function just barely grazes the
x
-axis, we might choose a smaller step size to be sure we see it). Some implementations allow the software to decide for itself what step size to use, taking larger steps where the function is changing slowly and is far from the
x
-axis, and reducing the step size as
g(x)
approaches zero. I shouldnt have to tell you that any algorithm that can adjust its step size
brings along a whole new bag of potential problems, and a good algorithm must be careful not to increase the step size too aggressively.
Its important to note that we use either two or three points as the solution proceeds. During the stepping phase, we must retain the last two points, so we can test for a sign change. We may, in fact, encounter a sign change, in which case these two points become
x0
and
x1
for the iterative phase. During the process of iteration, we generate one more
point, by one algorithm or another, and discard one of the previous points.
Back to minima
Aside from the relatively minor problem of choosing the step size for the step phase of the root-finder, I think you can see that this method is robust. As long as the function in question meets the requirements, that is, it is continuous with a finite first derivative, there is no way were going to lose the root. Once weve got those two
pivotal points,
x0
and
x1
, located, finding the root is as sure as sunrise.
Can we say that our search procedure for minima has the same degree of robustness? Sadly, we cant.
Lets take another look at the final algorithm from the last column, which I reproduce in
Listing 1
. In this algorithm, we begin with a large interval, again bracketed by
x0
and
x1
, and sample the function within that range until we find a point thats
lower than the others. We end up with three points, the middle one being our best guess for the minimum, and then we repeat the process.
But can we really be 100% sure that the points at any given step really do contain a minimum? Actually, no, we cant. In fact, we cant even be sure that a minimum exists in the original range. The only reason we have to believe that is because I said so. I
could
be lying.
At the end of each stage, we have three points
x0
,
x
, and
x1
that certainly seem to show that we have bracketed the minimum. We would not have been kicked out of the search procedure unless
f(x)
was less than the function at the other two points.
However, note that we dont use that middle value,
x
, when we begin the next cycle; we only use the two bracketing values. Were now going to split the new region up into segments, and evaluate the function for each value of
x
on the new grid. But how can we be sure that one
of these new grid values will actually yield a minimum?
Answer: we cant. In most cases, this will be true, but there are no guarantees. The function may meet all our criteria for a continuous function, but it still might be perverse enough to yield us larger values of
f(x)
for every point that we sample in the next cycle. Because we dont reuse the known minimum, we run the risk of losing it.
Consider that the process of finding a root is basically a linear process. That is, as
we get closer and closer to the root in
Figure 1
, the curve should look more and more like a straight line. Thus linear interpolation, defined by two points, should give us a good guess for the next iteration.
Seeking a minimum, though, is a second-order process. As we get closer to the minimum, our curve begins to look more and more like a parabola. To define the parabola, we need three points, not two. Whats more, when we begin the next cycle of iteration, we should
be certain that we dont release that middle point, which presumably is closest to the minimum, until were absolutely sure that we have a better one.
The bottom line, then, is this: where the root-finder needs to begin its iteration with two points bracketing the root, and generate a third one as it runs, the minimum-finder is going to need
three
points, and generate at least a fourth one as it runs.
Accordingly, while the stepping algorithm used in Listing 1 may be a wonderful
way of carrying out the initial search for a minimum, its no good once that first minimum is found. We were able to use it repetitively for our test case, but that was mainly because our test case is not diabolical enough. We were fortunate enough to have been able to locate the minimum rather easily on each iteration. We cant be sure that will always be true for the method weve been using.
A robust method
If a simple
sampling procedure cant be trusted, lets find one that can be. The first step in doing that is to agree that the main, iterative part of the procedure
must
be given three points, not two. Whats more, they should be such that the value of
f(x)
for the middle value is less than those for the other two. We could conceivably use our stepping function to get us to that point. However, it turns out that for the test case function weve been using, which is:
we can simply bisect the original range as a way to get started. Well begin, then, with the three points:
x0 = 0.0 y0 = 1.0000000
x1 = 0.5 y1 = 0.7071068
x2 = 1.0 y2 = 1.0000000
The value of
y1
is still a long ways from the minimum value of 1.0, but its enough to meet our criteria.
Given these initial values, what do we do next to reduce the width of the interval for the next
stage? Well, bisection seemed to work pretty well for the root-finder; maybe it will work here as well. Lets see.
What well do is bisect the two half-intervals, as shown in
Figure 2
. This will give us two new points, for a total of five. To keep the nomeclature clean, it will help if we relabel the input points P0, P2, and P4 (representing the number of fourths in the original interval). The two new points well call P1 and P3. The nice part about this
approach is that we still end up, at least for now, with equally spaced values, as in the case of bisection for roots.
Lets first look at what happens when we evaluate
y1
and
y3
. We have four main possibilities:
Both
y1
and
y3
are bigger than
y2
. This case is easy: our middle value is still
x2
, and we pull in both
x0
and
x4
, replacing them with
x1
and
x3
, respectively. Our final interval is half what it was
before
Both
y1
and
y3
are smaller than
y2
. In that case, we can use either one of them as the new middle value. Since we preferentially seek the first root, well make
x
1 the new middle value, even if
y3
is smaller
y1
is smaller than
y2
, but
y3
is larger. We still use
x1
as the new middle value
y1
is larger than
y2
, but
y3
is smaller. This time, we let
x3
be the new middle value
All four of these cases result in new ranges that are half the size of the old one, and the intervals are equal.
In addition to these cases, we must consider the troublesome cases where the
y1
or
y2
values, or both, are exactly equal to
y2
. With floating-point numbers, exact equality is supposed to be impossible, but its easy to devise symmetric cases in which we can indeed get exact equality. These cases are troublesome because now we dont know in which interval the
minimum lies. We might be tempted to just pick one, but we might well be guessing wrong. For example, if we see that
y1 = y2
, We might guess that the minimum is between
x1
and
y2
, since the function seems to be going downwards between
x0
and
x1
. We could argue that the slope, which averages negative between
x0
and
x1
, must be going through zero and turning positive again, before it gets to
x2
.
We could argue that, but we could also be wrong. A
counter-example would be the case where a maximum, not a mininum, occurs between
x1
and
x2
.
If we want to preserve robustness, we should
never
replace an end point thats higher than
y2
with one thats equal. Down that path lies the road to lost solutions. We can only replace end points with values that remain higher than the smallest value of
y
.
The only reasonable solution is to throw out one or more of the equal-
y
values. We must do this in such a way
that, if possible, the range is still reduced. If, for example,
y1= y2
, but
y3 > y2
, we can replace
y4
by
y3
. We now no longer have a balanced range; the middle value no longer bisects that range, but this result is better than losing the solution, and its one we can live with.
What if
y1
,
y2
, and
y3
are all three exactly equal? That would be the nightmare case; in this situation we cannot replace either
x0
or
x4
, so our range remains
the same size as it was before. There is, however, still a glimmer of hope. If we dont replace
x2
, we will have accomplished nothing; the next iteration would produce exactly the same result. However, if we replace
x2
by
x1
(again seeking the leftmost minimum), we at least have a chance of getting new values of
y
, and a resulting reduction in interval size, on the next iteration. Eventually, it would seem that we absolutely must find a value of
x
for which the inner
three values of
y
are not still equal. If we do get such a case, the function must be truly pathological, and all bets are off anyhow.
We now have a set of rules that should give us a usable function. Its unfortunate that we cant keep the interval balanced around
x2
, but we cant. We cant even guarantee that the interval will always be reduced. Sorry. The function is shown in
Listing 2
, along with a main program that calls it iteratively.
For now, well use as a halting criterion a simple measure of the width of the input interval. Well be talking more about the best strategy for deciding when to quit in future issues.
Try the program in
Listing 2
and see how it works. Its by no means perfect yet, but were sneaking up on a truly robust algorithm. Well edge a little closer next month. See you then.
References
1. Crenshaw, Jack, Roots of a Function,
Embedded Systems Programming
, Aug. 1994, p. 13.
2. Crenshaw, Jack, Roots by Iteration,
Embedded Systems Programming
, Oct. 1994, p. 13.
Back
Jack W. Crenshaw a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from
Auburn University. Crenshaw enjoys contact and can be reached via e-mail at
jcrens@earthlink.net
.