The Topic That Would Not Die - Embedded.com

The Topic That Would Not Die



To read original PDF of the print article, click here.

Programmer's Toolbox

September 2000

The Topic That Would Not Die

Jack W. Crenshaw

Locked in battle with his twin nemeses-Dell and minimization-Crenshaw pines for the good old days and gains some mathematical ground.

Boy , when I predicted that this series was going to be the longest-running ever, I wasn't just whistling Dixie, was I? I had no idea it would drag out as long as it has. I'm determined, however, to make a very large dent in the topic of minimization of a one-dimensional function this month. I want to get on with it, so I won't spend a lot of time philosophizing. I do, however, have a couple of thoughts that I need/want to share. I'll try to make them as brief as possible.

First, you may recall that in my last column, I ranted pretty loudly about my experience with Dell Computers. In the interest of fairness, I think I should offer a view from the other side. No, Dell didn't come back and address all of my complaints. No, they didn't offer to take the computer back or give me a new one. Dell's service record remains as rotten as ever; I haven't heard from them at all. I did hear from two other disgruntled Dell customers, whose stories are as bleak as my own. One has had three video boards replaced. When he asked Dell to give him a different brand of board, one that actually works, they refused. Their position: you ordered Brand X, you get Brand X. Apparently, Dell is content to keep replacing the Brand X board forever. Meanwhile, this reader is stuck with a $3,000 boat anchor.

Another reader, however, took me to task for being unfair to Dell. Considering that he had similar experiences with Gateway, I guess I have to agree. It's not that Dell is a bad company to deal with, relatively speaking. It's that they're all bad.

I've been pondering this matter in great detail, and I think I've figured out what's going on. See whether you agree.

Back in the '40s, when I was a kid, manufacturers built stuff to last. The metal was thick and strong, the parts over-engineered, and the designs intended to allow maintenance, essentially forever. My parents had an old GE refrigerator, the kind with the cylindrical evaporator on top. That sucker would freeze a milk bottle as solid as steel if you turned the thermostat far from “min.” Likewise, my wife and I had a portable Kenmore dishwasher that ran like a train for years. It was given to us used, and we handed it down to someone else. The only thing that stopped either one of those appliances was that the rubber gaskets rotted with age, and we couldn't find replacements. We had an old '46 Buick which was once sideswiped by a truck. It knocked about 100 pounds of caked mud out from under the car, but we were never able to find a single mark where the truck hit it. One night, in a drive-in theatre, a new Mercury sideswiped the same car. It ripped off his entire front-end grille, bumper, the works. We felt only the slightest bump, and again could find no damage.

Then someone noticed that people weren't buying Buicks, refrigerators, or dishwashers anymore. Most folks already had one, and the products were so good, they saw no reason to “upgrade.” In the '50s, some beancounter invented a new concept. It's called planned obsolescence.

Around that time, I worked for two companies, GE and IBM, whose consumer products groups had what were arguably among the best engineers who ever earned a degree. Their marching orders were simple: design the product so it will fail the day after the warranty runs out. At first, every once in awhile they would screw up and actually produce a quality product. But over the years they got better, meaning that the products got worse. Today, thanks to computerized modeling tools such as Nastran and its kin, an engineer can design a part with a safety factor of 1.001, and just enough material in it so that it will barely hold together, assuming that it's never stressed. That approach is what passes for good engineering in consumer circles.

In fact, I contend that some products were never intended to work. I'm talking about the fairly cheap ones that come in blister packs, and look very much like the products you used when you were younger. The manufacturer counts on your buying it based on pleasant memories. It's not until you get it home and unwrap it that you discover it's a fake; it doesn't actually do anything. The vendor counts on your not deeming it worth the effort to return the junk. My favorite example is one of those circle-drawing compasses like we all used in grade school. The new ones look just like the old ones, but they won't hold a setting. What? You expected to actually draw a circle? You expected the stapler to actually drive staples? Silly you.

Nevertheless, until recently, manufacturers at least made some pretense of trying to keep their customers happy. If your product didn't work right, they'd try to make it good. They'd replace it, fix it for you, sometimes even offer you extra perks to keep you happy.

Now we move on to the concept of acceptable loss. Years ago, in the early days of personal computing, I got a game for my TRS-80 called Hammurabi, the ancestor of all the Sim City sorts of games. In Hammurabi, you're the king of Babylon, and it's your job to run the country so that it prospers, along with you. I reckon everyone who's ever played the game starts out resolving to be the world's most benevolent king. No starvation, no lack for the citizens of my country. I'm going to be remembered for keeping the populace safe, happy, and well fed.

It doesn't take long, playing such a game, to discover that if you take care of the people too well, they multiply like rabbits. Immigrants move in from outside to take advantage of your largesse. The people plow under good croplands and build houses and stores on them. Next thing you know, everyone's starving, the coffers are empty, your army is in rebellion, and you're lucky to escape alive. As hard a lesson as it is, to survive the game you must allow a certain percentage of your people to be sick, poverty-stricken, or dead. It's the only way the country can prosper.

So what does this have to do with Dell, Gateway, and other vendors? Simply that I believe we're entering a whole new era of planned disgruntlement. The grandchildren of the beancounters who invented planned obsolescence are now following in their ancestor's footsteps. What's more, they all know how to run what-if games using Microsoft Excel.

I can see it now. Someone runs a scenario that involves cutting the quality of a product. The simulation shows that a certain percentage of customers are going to get lemons, and making them happy is not going to be easy. The vendor runs a good risk of losing them permanently. But, wonder of wonders, the spreadsheet says we don't care. If the company is growing, as most are, it's gaining more customers than it's losing. What's more, the cost of paying decent wages to support people who actually support can run higher than the cost of of losing customers.

I'm willing to bet that lurking within a company like Dell, a whole department of beancounters does nothing but run such scenarios. Like Hammurabi, they optimize the tradeoff between lost customers and the cost of keeping them. They have optimized the cost of producing a quality product against the cost of lost sales. The spreadsheet program says, right there in black and white, that it's cheaper in the long run to let the irate customer remain irate. It's cheaper to let them walk away cursing than it is to spend time and energy trying to fix their problem. In the end, the goals of departments called Tech Support and Customer Service are not what we laypeople think they are. They're not there to give support. Like the compass that can't draw a circle, they only give the appearance of offering support. And their goal is to do so while spending the minimum amount of money and using the least amount of resources. End result: the company is paying minimum wage to an inadequate and understaffed department. The people who deal with the complaints have only one goal: get the customer off the phone so we can move on to the next victim, er, valued customer.

What about minimization?
In case you hadn't noticed, I seem to have a tiger by the tail in this area of minimization. It was never my intention to re-invent the wheel, or write a whole new algorithm. After all, mathematicians have been minimizing functions for quite a long time now, and they should be pretty good at it. I thought that all I had to do was to pick up a canned technique called Brent's method, explain it to you, and move on. Silly me.

Therein lies yet another story. (Have you noticed I have a lot of stories? That's the price of advanced years, I'm afraid.) Years ago, when I first started working with computers, we didn't have much access to other folks' canned algorithms. If you needed a sine or cosine routine, you dug around until you found someone else's, or you wrote your own. At the time, I was working on things I thought were pretty important (helping mankind walk on the Moon), and I was hoping someone else would provide the algorithms I needed. They didn't.

A few years later, I went to work for IBM. At last, I thought, I'm in the company where the action is. Surely the company that practically invented (or at least popularized) the “Giant Brain” would have researched the math algorithms needed to solve virtually any problem, and kept canned versions handy to give to their engineers.

If such a group existed at IBM, it's news to me. I think it's a case where the cobbler's children have no shoes. Far from getting the best software IBM had to offer, we IBM programmers got to be the beta testers for the rest of the company. We didn't even get the same quality that customers did. As usual, I ended up writing my own algorithms.

My next job was with Northrop Corporation, which had a small IBM mini (the 1130, my only favorite IBM computer). Ironically enough, that computer came with a document called the IBM Scientific Subroutine Package (SSP). IBM did have a collection of tools; it's just that I had to leave IBM to find it. Though written in the worst kind of spaghetti-code Fortran II, the algorithms were pretty good, and to this day, I still use many of them.

That was then, this is now. Nowadays, companies all over the world write packages for solving math problems in Fortran, C, and other languages. On the Internet, applied mathematicians in academia pass around algorithms for solving all manner of complex problems. Authors like Press et al. write books like Numerical Recipes,1 which must contain the best algorithms of each kind. Surely I can be forgiven for assuming that I'd be able to pick up one such algorithm and present it to you in the form of a robust and general-purpose minimum-finder. I thought that in Brent's method I'd found one.

Alas, it was not to be. Not only did I not like the method, I couldn't find the one thing I absolutely insist on in this column: a cogent explanation of how it works. Who could have guessed this turn of events?

Nevertheless, like it or not, I seem to be at that branch point in this study. I absolutely will not be deterred from giving you an algorithm this month. If it has to be one I've invented on the spot, that's what it has to be.

More on Brent and Press
Now that I've taken on, in recent months, Microsoft, Mathsoft, and Dell, can it be that I'm about to take on Press et al. as well? Yeah, I guess it can. You've heard me say, in the past, that I was not a great fan of the Numerical Recipes series. Popular as it is, I always get the feeling that I'm getting a canned subroutine, converted over from old Fortran code by people who didn't understand it all that well.

Don't get me wrong; the scope of coverage of Recipes is monumental. Press et al. give more subroutines to solve more problems than I am likely to write in several lifetimes. I've always stood in awe over the sheer volume of algorithms that are covered in the book. Whenever I want to learn more about certain problems, I still go to Recipes first. Still, one is left with that nagging question: how can one or a few people be experts in so many different areas? Obvious answer: they can't. When I want a truly authoritative explanation of an algorithm, I generally find that I must look elsewhere.

I thought I was alone in my reaction to Recipes. A reader pointed out that I was not, and referred me to the Web site: math.jpl.nasa.gov/nr/.

To spare you the suspense of looking it up, I'll quote the title: “Why not use Numerical Recipes?” and the first line, “We have found Numerical Recipes to be generally unreliable.”

Pretty strong words. Author W. van Snyder of NASA-JPL goes on to quote several specialists who share the feeling that Recipes gives one just enough information to get into trouble. One of the experts quoted is my hero, Lawrence F. Shampine. Shampine is arguably the world's leading authority on numerical integration algorithms. Whatever he says, you can take to the bank.

The general flavor of the Web site article is that while Recipes may be a good source for serviceable canned subroutines to solve certain problems, they are not the ones an expert in that particular area would use.

Understand, I don't intend this as any great indictment of the Recipes books. Press et al. have never presented the routines as the most powerful or effective of their kind. Their goal seems to be to cover as much ground, in as many areas, as possible. It would be asking a lot to expect them to not only give you algorithms that work in so many different areas and represent the current state of the art in every one. My only problem is that the current state of the art, for this particular area, is exactly what I was looking for.

Please don't get the idea that, though I haven't been with you much lately, I haven't been working the problem. I've been traveling the Information Highway seeking more background on Brent's method. I'm afraid I didn't find much. Recipes gives as reference Brent's book, Algorithms for Minimization Without Derivatives.2 As we've discussed before, that book seems to be the result of Brent's dissertation work. But it's out of print, and my perpetual request to Amazon.com has not turned up a copy.

In search of more explanation of Brent's method and how it works, I turned to Alan Miller's Web site www.ozemail.com.au/~milleraj/.Now, if you're looking for someone who really is an expert in every single area of numerical algorithms, Alan's your man. I've never seen so many implementations (mostly Fortran and the new language, F-more on this next month). Sadly, his Web site doesn't list an implementation of Brent's method. However, if you follow the trail through “Non-lin. programming,” you'll arrive at plato.la.asu.edu/guide.html, a site maintained by Hans D. Mittlemann and P. Spellucci of Arizona State University. Take another step through their “decision tree for optimization software,” through paths for unconstrained linear and nonlinear optimization, and lo and behold, there's netlib.org/cgi-bin/netlibfiles.pl?filename=/go/fmin.f, which is their implementation of Brent's method. (You can also find fmin.f by following a trail through the van Snyder Web site.) At first glance, fmin.f appears to be the worst kind of spaghetti code Fortran IV. However, take another look. It's really a straightforward translation of Brent's original code, which was written in Algol. (No wonder we don't find many copies of it!) The many GOTOs in the Fortran version merely represent the limitations of Fortran IV, not Mittlemann and Spelluci.

Listing 1: Brent's method, Fortran version
     	 q=(x-v)*(fx-fw)        p=(x-v)*q-(x-w)*r        q=2.0d0*(q-r)		 if (q.le.0.0d0) then		   q=-q		 else		   p=-p		 endif		 r=e		 e=d	 endif	 if ((dabs(p).ge.dabs(0.5d0*q*r)).or.(p.le.q*(a-x))   2       .or.(p.ge.q*(b-x))) thencc   a golden-section stepc        if (x.ge.xm) then		   e=a-x		 else		   e=b-x		 endif		   d=c*e	 elsecc   a parabolic-interpolation stepc        d=p/q		 u=x+dcc  	f must not be evaluated too close to ax or bxc		 if (((u-a).lt.tol2).or.((b-u).lt.tol2)) then		   d=tol1		     if (x.ge.xm) then		       d=-d		     endif		   endif	   endifcc  	f must not be evaluated too close to xc	     if (dabs(d).lt.tol1) then            if(d.le.0.odo) then              u+x-tol1              else  		       u=x+tol1		     endif		   else		     u=x+d		   endif	       fu=f(u)          iter=iter+1  cc  	update  a, b, v, w, and xc	     if (fx.le.fu) then		     if (u.ge.x) then		       b=u		     else		       a=u		     endif		   endif	     if (fu.gt.fx) then		     if ((fu.gt.fw).and.(w.ne.x)) then		       if ((fu.le.fv).or.(v.eq.x).or.(v.eq.w)) then		         v=u		         fv=fu		       endif		     else		       v=w		       fv=fw		       w=u		       fw=fu		     endif	     else		     if (u.ge.x) then		       a=x		     else		       b=x		     endif		     v=w		     fv=fw		     w=x		     fw=fx            x=u		     fx=fu		   endifcc 	update tolerances for next loop c	     xm=0.5d0*(a+b)	     tol1=eps*dabs(x)+tol3	     tol2=2.0d0*tol1     enddocc  end of main loopc     fmin=x	return

Fortunately, we are not limited to Fortran IV. I took a look at fmin.f myself, to see what it might have looked like had its authors had access to the structured constructs of Fortran 77 or 90. Not surprisingly, since it was derived from the structured language Algol, the code unravels nicely. For those of you who like Fortran, a copy of the modified code is shown in Listing 1. If I've done nothing else in this series, at least I've given the world a version of Brent's algorithm that one can actually read.

Comparing the Fortran code from fmin.f with the C code from Recipes, I was struck by two things. First, they are almost identical; even the same variable names are used in most places, and the same operations are done in pretty much the same places. Somewhat surprisingly, though, they are not exactly identical. Nor do they perform identically. On the same problem, my structured version of fmin.f took 11 iterations to converge to double-precision accuracy. The Recipes version, brent.c, took 19 iterations to converge to only single-precision accuracy.

At this point in my trek, I am completely bummed out. I suppose, for the sake of completeness, I should have been spending my summer vacation comparing the two programs and trying to discover the reason why the Fortran version converges nearly twice as fast. I didn't. To be brutally honest, I don't really care anymore. I've seen all of Brent's method I want to see.

Can we improve on Brent?
When I began this odyssey, it was certainly not my intention to try to invent the world's greatest second-order minimization algorithm. And I must say, I feel pretty nervous about attempting it. This is not the kind of thing one does casually. A lot of users out there want good algorithms, and if my offering turns out to have one or more fatal flaws, you can be sure I'm going to be hearing from them in droves. I'm sure to make van Snyder's hit list. However, at this point, I don't see that I have much of an alternative. My goal has always been to give you a method that not only works, but that I can explain to you rationally. If I can't explain Brent, I'll explain Crenshaw.

There's also no way I can spend the kind of time Brent presumably spent on his dissertation project. To come up with the perfect algorithm, one should run it on as many different kinds of problems as possible, looking for the cases that give it trouble, and then patching it up to deal with them. I don't have time to do that here. So I'll make a deal with you. Let's do this thing together. I'll give you an algorithm that seems to work on a limited number of cases. Feel free to try it out and let me know how it works. If you find problems with it, get back to me, and we'll fix it together. Between us, maybe we'll arrive at that killer solution, after all. Deal?

Getting serious
Okay. We've pussyfooted around this issue for long enough. Let's put this sucker to bed. Before doing so, however, maybe I should review the bidding. We've already seen, months ago, an algorithm that can reliably find a minimum of a single variable by means of bisection. Bisection using the golden ratio turned out to work a little better than halving intervals.

The problem with any bisection method is that it takes awhile to get the solution. We can expect each step to improve our solution by just half a bit. For improvement by a factor of, say, 10,000 (typical for minimum-finders; solution to machine accuracy is not feasible), that means around 24 to 26 iterations. Not outrageously inefficient (did you catch the fact that brent.c, from Recipes, took 19 on my sample problem?), but naturally we'd like to do better. A better approach is to use a quadratic method. Such methods involve fitting a quadratic through the three smallest points, and using that to estimate the minimum. A quadratic method doubles the number of significant digits for each iteration. Once it gets near a solution, it homes in like a hound dog after a possum. To the 104 accuracy, we found a solution to our sample problem in seven iterations (nine function evaluations, counting setup of the initial conditions).

Each approach has its advantages. The bisection method is the workhorse, the bubble sort of minimum finders. It may be slow, but it's sure. One nice feature of bisection methods is that they don't really care if the function in question is well-behaved or not. For example, the golden section search will cheerfully find the minimum of:

Such a function will drive a quadratic method crazy, because its derivative is discontinuous. Like most higher-order methods, the quadratic method is more twitchy and more easily confused. Another thing I like about bisection methods is the criterion we use to halt. Beginning with two points straddling the solution, a bisection method pulls both ends inward. We don't really have to test the solution itself; we stop when the distance between end points is below some epsilon distance.

By contrast, with quadratic methods we get a predicted location for the minimum and compare that with previous values. This makes me a bit nervous, since we've already seen how certain combinations of inputs can yield the same predicted minima, even though neither is anywhere close to the true value.

I also noticed that quadratic methods tend to hang on to end points longer than bisection methods. Bisection methods have no choice; that range is going to be reduced by roughly 25% (a bit more with the golden section) no matter what. Quadratic methods don't really care which points they use, as long as they're roughly centered on the solution.

The thing that attracted me to Brent's method was that it seems to be a near-optimal mix of golden section and quadratic searches. If it can get what it regards as a good quadratic fit, that's what it will use. If not, it rejects the step and uses a golden section instead.

But what makes me nervous about Brent's method-other than that I can't get anyone to explain it to me-is that it will sometimes use quadratics exclusively, if it's satisfied with certain conditions. Because I've seen the quadratic methods hang on to end points too long, I suspect that the interval between the end points doesn't decrease as rapidly as I'd like.

How do we make the two points draw together? I haven't a clue. Short of forcing bisection to bring them together, I'm not sure we can guarantee that the bracketing interval ever gets small. And if it doesn't get small at a rate faster than bisection would produce, we have no hope of speeding the convergence. However, I'd feel more comfortable with an occasional forced bisection step, regardless of how well the quadratic fit is working.

Picking nits
To summarize, the things that bother me about Brent's method are:

  • It can and will finish exclusively with quadratic fits
  • We cannot guarantee that the bracketing interval is small

In addition to these concerns, I've found two other things to complain about. In both the C and Fortran versions of Brent's method, I see that a small epsilon distance is used-ZEPS in Recipes, EPS in fmin.f. (Mittlemann and Spelluci use a machine-language function to find the machine epsilon. I just stuck a reasonable number in its place.) This parameter is later used to set a value for tol3. Brent's method needs this epsilon value because it's looking for a certain maximum relative (not absolute) error in x. See the line:

tol1=eps*dabs(x)+tol3

If the predicted minimum of x just happens to be exactly zero, and tol3 were zero, Brent's method would fail because of a zero tolerance.

Now, I don't claim to be the world's leading expert in numerical methods, but I know that this step is just plain wrong. You don't try to use a relative error when the parameter you're searching for can be zero. The step is also wholly unnecessary. Going in, we are given a span of x defined by the two end points of the search range. Therefore we know exactly how wide this range is. We also know to expect to narrow it down only so far: roughly the input range times the square root of the machine precision. Our algorithm can set an absolute tolerance at the beginning, and send ZEPS to the showers.

The bottom line is this: we want a method that combines the nicest features of both bisection and quadratic fit. Brent's method is such a method, but certain details, such as the halt criterion and the possibility of using quadratics exclusively, are cause for concern. Instead of trying to psychoanalyze Brent and his algorithm, let's simply write one that uses the same notions, but does things our way. If it turns out looking very much like Brent's method, we should not be too surprised. I suspect, however, that it will also display significant differences, one of which I just mentioned.One last thought before we dive in: it occurrs to me that one reason algorithms like brent.c, fmin.f, and in fact most of the algorithms in Recipes seem so inscrutable is that their authors wanted them to be time-efficient. At this point, I don't care about that. I'd like to have numerical efficiency, but I don't care about counting clock cycles. Therefore, we can afford to sacrifice clock cycles for modular and transparent code. We'll worry about speed another day. With that long-winded preamble, let's start writing some code.

Listing 2: Functions for golden section search
     	 // swap two double-precision floatsvoid swap(double &x1, double &x2){	double temp = x1;   	x1 = x2;   	x2 = temp;}// divide an interval according to the golden sectiondouble gold(double x0, double x2){	static const double g = 2-golden;	return x0 + g *(x2 - x0);}// shrink an interval by dividing one side of itvoid gold_shrink(double &x0, double &x1, double &x2,	              double &y0, double &y1, double &y2){	double x = gold(x0, x1);	double y = f(x);	if(y > y1){		x0 = x;		y0 = y;    }	else{		x2 = x1;		y2 = y1;		x1 = x;		y1 = y;	}	swap(x0, x2);	swap(y0, y2);}// Find the minimum of a function using golden section searchdouble gold_search(double &x0, double &x2, double Eps){	double eps = Eps * (abs(x2-x0));	double x1 = gold(x0, x2);	swap(x0, x2);	double y0 = f(x0);	double y1 = f(x1);	double y2 = f(x2);	while(abs(x2-x0) > eps)		gold_shrink(x0, x1, x2, y0, y1, y2);	return (x0+x2)/2.0;

Getting it done
I know I'm going to need some subfunctions that do the kinds o

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.