CMP EMBEDDED.COM

Login | Register     Welcome Guest RFID World  esc india  TeardownTV
 


Programmer's Toolbox

More on Minimization

Jack Crenshaw

After a minimal hiatus, Dr. Crenshaw returns to continue his discussion of function minima and report on reader feedback about Excel navigation.

Sorry I wasn’t here for last month’s column; I hope you didn’t miss me too much. Actually, that’s not true; I hope you did miss me, since that would mean I still have some value to the magazine. It would be more accurate to say that I’m sorry if I left you hanging for a month in the midst of a discussion on seeking function minima, and I do apologize for it. I’ve been writing this column now for nigh onto eight years, and I try hard to keep them coming, and to maintain continuity between monthly issues. I haven’t missed many issues, but one is one too many. Sometimes, however, a schedule simply can’t be met, and this was one of them.

One of the few advantages of missing an issue, other than the fact that I get to get a good night’s sleep, is that I can get feedback on one column before it’s time to write the next one. Specifically, I got feedback on my comments regarding navigating around in Microsoft Excel.

Boy, did I get feedback! To say that I’ve received e-mail on the subject would be an understatement. To refresh your memory, I commented on how bothersome I found it to have to scroll way down a long Excel spreadsheet to highlight a certain block of cells. I wished out loud for a mechanism like Quattro Pro that allowed me to select a block by its bounding cell addresses.

The e-mail messages all gave me alternative approaches to scrolling. They tended to fall into three main groups. The first, and by far the most numerous, pointed out that one doesn’t need to hold the mouse button down while selecting the block. Instead, you can use shift-click to highlight the second corner cell, and that will highlight the whole block.

A typical message might have read: “To highlight a block of cells, first click on the top left cell, then scroll to the bottom right, hold down the shift key, and click again.”

Sorry, no cigar. The whole point of the comment was not to have to scroll, to highlight the block. Any solution with the word “scroll” in it is no solution. You only gave me a way to avoid holding the mouse key down.

Interestingly, and perhaps significantly, it was the writers of these non-solutions that tended to be the nastiest, and took me to task the most earnestly for being dummy enough not to know how to use Excel effectively. In my defense, I can only say, if you folks use as many different tools as I have to, you probably don’t know how to make them all sing either.

The second set of suggestions was far more helpful. Several people pointed out that Excel does have a command called “Go to” in the Edit pull-down menu. What I didn’t know was that this command can be used to highlight a block. To use it, first click on the top left cell in the normal manner. Then, click Edit/GoTo. When the destination prompt window pops up, first enter the address of the bottom corner of the range, then before pressing “OK,” hold down the shift key. This highlights the entire range.

I have to admit, that’s a neat trick. I suppose it’s even in the Excel manual somewhere. I can tell you with some confidence, it is not in the Excel help screens. Here’s what Help has to say: “A large range of cells: Click the first cell in the range, and then hold down SHIFT and click the last cell in the range. You can scroll to make the last cell visible.”

That’s it. No mention of GoTo’s anywhere.

The third set of messages were few and far between but did seem to offer a true solution. I’ll let Jamie Cox speak for that group:

“Excel is so feature-rich, I couldn’t imagine that by this time any feature had been overlooked, and I was right. You can make a large selection using the Goto item in the Edit menu (shortcut Ctrl-G). You can type in something like “A5:B500” and the range of cells is instantly selected. I had to go read the help to find this, but it’s there!

“What you get out of application software is limited mostly by what you expect. Expect a lot. You’ll usually get it.”

This is the key: the GoTo works for ranges as well as single cells. That is, then, by far the most straightforward solution. Thanks to all for bringing the technique to our attention. I say “our” advisedly since, judging by my inbox, the great majority of Excel users also don’t know about the technique.

Also thanks, Jamie, for the good philosophical advice. I’ll try to take it to heart. One thing Jamie didn’t tell me, though, is how to find that reference in the help files. I still can’t find that one.

Scientist, et al.

As long as we’re on the subject of e-mail, I should also mention that I asked, some months ago, if anyone knew if Micromath was still alive and selling their curve-fit product, Scientist. They are indeed. Readers Wayne Smart, Alex McIlraith, and Scott Maley report their Web address: www.micromath.com/scientist.html .

Scott also adds, “You might try two products offered by SPSS ( www.spss.com ): TableCurve 2D and TableCurve 3D.” Reader Mark Bottomley writes, “I have found that CurveExpert, a shareware tool has worked admirably for me. The cost is low, about $35 to register, and the interface is very simple ( www.ebicom.net/~dhyams/cvxpt.htm ).” Allen Windorn concurs.

Carey E. Bloodworth writes, “I’d like to mention a free little symbolic math package called MuPad lite. The lite version is available for free (for evaluation purposes, of course), while its big brother is a commercial product. (I haven’t seen the commercial version.)”

A few other people have also pointed out alternatives to Matlab. Reader Steve Hicks from Oak Ridge National Lab (my old stomping grounds), recommends VisSim, from Visual Solutions. Robert E. Alvarez references a freeware version of Matlab called Octave ( www.che.wisc.edu/octave/ ).

Ted Truske mentions a program called Gauss, which is “similar to Matlab in that it is a vector/matrix algebraic handler with interpreter-based programming and excellent graphics capabilities.” Gauss is produced by Aptech Systems Inc. ( www.aptech.com ).

Reader Calvin Conrey writes, “I did want to bring your attention to a cousin of MatLab called SciLab. From what I understand, it is descended from the same original FORTRAN code as Matlab, but has been kept as free software by the efforts of a group in France. With the exception of add-ons like Simulink, it is very much a work alike to Matlab, and many Matlab m-files work with little or no modification. If you’re on a budget or just learning the tools, SciLab is a great way to go. It’s available for most Unix/Linux variants as well as Windows 95/98 ( www.rocq.inria.fr/scilab/scilab.html ).”

I’ve also been getting e-mail from Robert Ford of MathTools ( www.mathtools.com ). They make products that are compatible with Matlab. One is called Mideva, another Matcom. Robert has been extremely helpful and informative in offering to tell me more about their products. I blushingly admit that, lately, I’ve been too busy to pay as much attention as the products deserve. But you need not be so limited.

Robert tells me that MathTools has recently started a series of Web forums for discussion of their products. You can reach them at www.mathtools. com .

Reader Gary Knott wrote to point out that his company, Civilized Software ( www.civilized.com ), has a Matlab-like product called MLAB. I’ll be trying out a copy of that one soon.

Last but not least, Matt Calder wrote, “I feel compelled to let you know about a MATLAB alternative, SPLUS. SPLUS is a programming language/environment very similar to Matlab. It is made by Mathsoft [same company that sells Mathcad]. SPLUS is based on the S language originally created at Bell labs (usually attributed to John Chambers). SPLUS, like Matlab, is an honest-to-goodness programming language used from a command line or script file.

“The language definitely has a statistical bias, with literally thousands of functions which inspire the same feeling of ‘time to go back to school’ in a statistician as Matlab does in an engineer. Unfortunately, SPLUS is about as strong in engineering-specific functionality as Matlab is in statistics (which is to say not very), but for general-purpose number crunching and graphics they are comparable.”

Finally, Matt was among the several correspondents who pointed out a serious boo-boo in my last column. I had made a reference to the “central limit theorem” in discussing the existence of a root of a function between a positive and negative value. Matt rightly pointed out that I got the wrong theorem; the central limit theorem has to do with the statistics of the sum of random variables. The theorem I had in mind was the mean value theorem. Sorry.

Great work, folks! Thanks for all the great information. Keep it coming.

On to minimization

When last seen, our hero was trying to devise a general method for finding the minimum of a function, f(x) . Specifically, he was looking for the value of x for which f(x) is minimized.

To refresh your memory, we began with the most brute-force (and also the most reliable) of all minimizers, an exhaustive search of the specified space, to some degree of resolution. Any time we found a value of f(x) smaller than our previous best, we recorded the corresponding value x . This method, crude though it may be, does have one huge advantage, which is that it will distinguish between global and local minima. Most other methods quit when they find any kind of minimum, so they will be satisfied with a local one.

Next, we tried to speed things up by reducing the range of our search. The idea is to search with a fairly large step size until we find what seems to be a minimum. Then we narrow our focus to a region near this point, and slowly refine our resolution until it’s acceptable (whatever that means). Finally, we recognized that once we’ve stepped past a minimum, it’s pointless to go on (unless we’re hoping for an even better global one later), so we might as well kick out of the subroutine and declare victory.

The roundabout route

In my studies so far, I’ve tended to divide the initial range up into a fairly small number of divisions, such as 10. The question as to how many divisions is one of many questions that remain to be answered before we move on. We’ll take the questions one at a time, but first, I’d like to inject a bit more rigor into the discussion than we’ve used so far. To do so, I’m going to take the long way around, and first review how we find the root of a function. That’s because the two problems, though different, also have much in common, as do the methods used to solve them. Both the similarities and the differences will be instructive. I gave a detailed analysis of root-finding some years ago, which I’ll summarize below (send me e-mail if you’d like to see the details again). 1,2

Imagine that you have some function g(x) , and are seeking the value of x for which:


g(x) = 0

In the textbooks, you’ll find a whole plethora of methods for finding the root x , for which this equation is satisfied. Newton’s method, the method of false position, and the secant method all come to mind. Like so many things in textbooks, however, you will also find that for any of these methods to work very well, you must first know that a root exists and have a general idea where it is.

So how do we do that? One way is to have the root bracketed before we even begin. This is where we came in (or rather, went out) in my last column. If I have two values x 0 and x 1, such that:


g(x0) 
<
0
g(x1) > 0

I can be pretty darned sure that somewhere in between the two, the value of g(x) is equal to zero. How do I know this? Through the central, er, mean value theorem. Also through common sense. If the function g(x) is continuous and has a slope that is everywhere finite, how is it going to get from positive to negative values without passing through zero? Answer: it can’t. It absolutely must pass through zero somewhere, as it crosses from positive to negative values. (Clearly, the same is true going from negative to positive.) If we begin the process with x0 and x1 known, we can be pretty confident that we will find a root. The only challenges to the root-finding function are to find it quickly, and not screw up and lose the root somewhere along the way.

One simple method for doing this is very much akin to our search for minima. It’s called the method of bisection, and it means pretty much what the name implies: we simply divide the region in half, and generate a third point (see Figure 1 ):

When we evaluate g(x) , we can get one of three possible outcomes:

g(x) = 0 . In this case, we have found the root and are finished

g(x) < 0 . In this case, we know that the root must be between x and x1 . We replace x0 with x , and we will have moved closer to a solution

g(x) > 0 . In this case (the one shown), we know that the root must be between x0 and x . We replace x1 with x , and we will have moved closer to a solution

Either way, we cannot fail to improve our position. The method is slow but sure; each bisection takes us inexorably closer to the root.

Note that for this approach to work, we must always start with the root already bracketed. The method does not usually work otherwise. The obvious next question is, how do we get the root bracketed in the first place?

There are two possible ways. First, if we know the nature of the function, we can start with a wide region, that is almost certain to give us bracketing values. That approach would work just fine for a function as well behaved as the one in Figure 1 , but it may not for more tricky functions. In fact, the function may contain multiple roots. If, for example, it’s generated from a high-order polynomial, it could indeed have many roots.

In such cases, we must do some kind of initial search procedure, which starts at some initial value of x , and begins stepping along the function with some specified step size. The procedure will do so until it detects either a zero (root), or a sign change. Once we get the sign change, we have the root bracketed, and can go into our bisection algorithm (or some other, perhaps faster one, such as the method of false position or the super-fast, second-order method I showed in Reference 2).

If there are problems to be encountered in seeking a root, you can be sure they’ll appear in the stepping phase. Everything depends on choosing the right step size. If it’s too small, and the function changes very slowly, we could burn a lot of computer time seeking the bracketing region. On the other hand, if the step is too large, we might step right past one or more roots (almost certainly an even number) without even noticing them. Usually, we use a fixed step size and hope it’s small enough. That hope is usually tempered by some advanced knowledge of the nature of the function (for example, if we know that there could be places where the function just barely grazes the x -axis, we might choose a smaller step size to be sure we see it). Some implementations allow the software to decide for itself what step size to use, taking larger steps where the function is changing slowly and is far from the x -axis, and reducing the step size as g(x) approaches zero. I shouldn’t have to tell you that any algorithm that can adjust its step size brings along a whole new bag of potential problems, and a good algorithm must be careful not to increase the step size too aggressively.

It’s important to note that we use either two or three points as the solution proceeds. During the stepping phase, we must retain the last two points, so we can test for a sign change. We may, in fact, encounter a sign change, in which case these two points become x0 and x1 for the iterative phase. During the process of iteration, we generate one more point, by one algorithm or another, and discard one of the previous points.

Back to minima

Aside from the relatively minor problem of choosing the step size for the step phase of the root-finder, I think you can see that this method is robust. As long as the function in question meets the requirements, that is, it is continuous with a finite first derivative, there is no way we’re going to lose the root. Once we’ve got those two pivotal points, x0 and x1 , located, finding the root is as sure as sunrise.

Can we say that our search procedure for minima has the same degree of robustness? Sadly, we can’t.

Let’s take another look at the final algorithm from the last column, which I reproduce in Listing 1 . In this algorithm, we begin with a large interval, again bracketed by x0 and x1 , and sample the function within that range until we find a point that’s lower than the others. We end up with three points, the middle one being our best guess for the minimum, and then we repeat the process.

But can we really be 100% sure that the points at any given step really do contain a minimum? Actually, no, we can’t. In fact, we can’t even be sure that a minimum exists in the original range. The only reason we have to believe that is because I said so. I could be lying.

At the end of each stage, we have three points— x0 , x , and x1 —that certainly seem to show that we have bracketed the minimum. We would not have been kicked out of the search procedure unless f(x) was less than the function at the other two points.

However, note that we don’t use that middle value, x , when we begin the next cycle; we only use the two bracketing values. We’re now going to split the new region up into segments, and evaluate the function for each value of x on the new grid. But how can we be sure that one of these new grid values will actually yield a minimum?

Answer: we can’t. In most cases, this will be true, but there are no guarantees. The function may meet all our criteria for a continuous function, but it still might be perverse enough to yield us larger values of f(x) for every point that we sample in the next cycle. Because we don’t reuse the known minimum, we run the risk of losing it.

Consider that the process of finding a root is basically a linear process. That is, as we get closer and closer to the root in Figure 1 , the curve should look more and more like a straight line. Thus linear interpolation, defined by two points, should give us a good guess for the next iteration.

Seeking a minimum, though, is a second-order process. As we get closer to the minimum, our curve begins to look more and more like a parabola. To define the parabola, we need three points, not two. What’s more, when we begin the next cycle of iteration, we should be certain that we don’t release that middle point, which presumably is closest to the minimum, until we’re absolutely sure that we have a better one.

The bottom line, then, is this: where the root-finder needs to begin its iteration with two points bracketing the root, and generate a third one as it runs, the minimum-finder is going to need three points, and generate at least a fourth one as it runs.

Accordingly, while the stepping algorithm used in Listing 1 may be a wonderful way of carrying out the initial search for a minimum, it’s no good once that first minimum is found. We were able to use it repetitively for our test case, but that was mainly because our test case is not diabolical enough. We were fortunate enough to have been able to locate the minimum rather easily on each iteration. We can’t be sure that will always be true for the method we’ve been using.

A robust method

If a simple sampling procedure can’t be trusted, let’s find one that can be. The first step in doing that is to agree that the main, iterative part of the procedure must be given three points, not two. What’s more, they should be such that the value of f(x) for the middle value is less than those for the other two. We could conceivably use our stepping function to get us to that point. However, it turns out that for the test case function we’ve been using, which is:

we can simply bisect the original range as a way to get started. We’ll begin, then, with the three points:

x0 = 0.0 y0 = 1.0000000
x1 = 0.5 y1 = 0.7071068
x2 = 1.0 y2 = 1.0000000

The value of y1 is still a long ways from the minimum value of –1.0, but it’s enough to meet our criteria.

Given these initial values, what do we do next to reduce the width of the interval for the next stage? Well, bisection seemed to work pretty well for the root-finder; maybe it will work here as well. Let’s see.

What we’ll do is bisect the two half-intervals, as shown in Figure 2 . This will give us two new points, for a total of five. To keep the nomeclature clean, it will help if we relabel the input points P0, P2, and P4 (representing the number of fourths in the original interval). The two new points we’ll call P1 and P3. The nice part about this approach is that we still end up, at least for now, with equally spaced values, as in the case of bisection for roots.

Let’s first look at what happens when we evaluate y1 and y3 . We have four main possibilities:

• Both y1 and y3 are bigger than y2 . This case is easy: our middle value is still x2 , and we pull in both x0 and x4 , replacing them with x1 and x3 , respectively. Our final interval is half what it was before

• Both y1 and y3 are smaller than y2 . In that case, we can use either one of them as the new middle value. Since we preferentially seek the first root, we’ll make x 1 the new middle value, even if y3 is smaller

y1 is smaller than y2 , but y3 is larger. We still use x1 as the new middle value

y1 is larger than y2 , but y3 is smaller. This time, we let x3 be the new middle value

All four of these cases result in new ranges that are half the size of the old one, and the intervals are equal.

In addition to these cases, we must consider the troublesome cases where the y1 or y2 values, or both, are exactly equal to y2 . With floating-point numbers, exact equality is supposed to be impossible, but it’s easy to devise symmetric cases in which we can indeed get exact equality. These cases are troublesome because now we don’t know in which interval the minimum lies. We might be tempted to just pick one, but we might well be guessing wrong. For example, if we see that y1 = y2 , We might guess that the minimum is between x1 and y2 , since the function seems to be going downwards between x0 and x1 . We could argue that the slope, which averages negative between x0 and x1 , must be going through zero and turning positive again, before it gets to x2 .

We could argue that, but we could also be wrong. A counter-example would be the case where a maximum, not a mininum, occurs between x1 and x2 .

If we want to preserve robustness, we should never replace an end point that’s higher than y2 with one that’s equal. Down that path lies the road to lost solutions. We can only replace end points with values that remain higher than the smallest value of y .

The only reasonable solution is to throw out one or more of the equal- y values. We must do this in such a way that, if possible, the range is still reduced. If, for example, y1= y2 , but y3 > y2 , we can replace y4 by y3 . We now no longer have a balanced range; the middle value no longer bisects that range, but this result is better than losing the solution, and it’s one we can live with.

What if y1 , y2 , and y3 are all three exactly equal? That would be the nightmare case; in this situation we cannot replace either x0 or x4 , so our range remains the same size as it was before. There is, however, still a glimmer of hope. If we don’t replace x2 , we will have accomplished nothing; the next iteration would produce exactly the same result. However, if we replace x2 by x1 (again seeking the leftmost minimum), we at least have a chance of getting new values of y , and a resulting reduction in interval size, on the next iteration. Eventually, it would seem that we absolutely must find a value of x for which the inner three values of y are not still equal. If we do get such a case, the function must be truly pathological, and all bets are off anyhow.

We now have a set of rules that should give us a usable function. It’s unfortunate that we can’t keep the interval balanced around x2 , but we can’t. We can’t even guarantee that the interval will always be reduced. Sorry. The function is shown in Listing 2 , along with a main program that calls it iteratively. For now, we’ll use as a halting criterion a simple measure of the width of the input interval. We’ll be talking more about the best strategy for deciding when to quit in future issues.

Try the program in Listing 2 and see how it works. It’s by no means perfect yet, but we’re sneaking up on a truly robust algorithm. We’ll edge a little closer next month. See you then.

References

1. Crenshaw, Jack, “Roots of a Function,” Embedded Systems Programming , Aug. 1994, p. 13.

2. Crenshaw, Jack, “Roots by Iteration,” Embedded Systems Programming , Oct. 1994, p. 13.
Back

Jack W. Crenshaw a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via e-mail at jcrens@earthlink.net .

FIGURES
Figure 1: Solution by bisection
Figure 2: Minimum by bisection
LISTINGS
Listing 1: Minimizing by range reduction
Listing 2: Minimization by bisection
WEB LINKS
Micromath Scientist: http://www.micromath.com/scientist.html
SPSS: http://www.spss.com
CurveExpert: http://www.ebicom.net/~dhyams/cvxpt.htm
Octave: http://www.che.wisc.edu/octave/
Aptech Systems' Gauss: http://www.aptech.com
Scilab: http://www-rocq.inria.fr/scilab/scilab.html
Mathtools: http://www.mathtools.com
Mathtools Web Forum: Mathtools Web Forum:
Civilized Software: http://www.civilized.com

Return to Table of Contents

Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS


 :