Despite making every effort to avoid it, Jack keeps running into new problems with minimization at the heart of the solution.
Regular readers know that I often ramble a bit, discussing things that are mostly off topic. In the past, I've touched on everything from the Great Pyramid to my experiences with Dell, to good guys and (other) bad guys, to toy robots, to what's wrong with the world today. In most of my recent columns, however, I've wanted so much to put this business of minimization to bed that I've restrained my usual tendency to ramble, and tried to stick to the discussion at hand.
However, this month, though I still very much want to finish off minimization, I feel the uncontrollable urge to ramble. So fasten your seat belts, get ready for a bumpy ride, and I'll try to make this as fast as possible.
No doubt, some of you newcomers who've only seen the last few issues must be thinking, “What's with this guy? Is minimization the only problem he ever thinks about?” I've gotten some e-mail asking me that very question. One writer asked why I'm thinking about minimization at all, since most embedded programs don't need or use it.
That's a fair question, and it deserves a valid answer. Regular readers already know, but it has been a long time, so perhaps a refresher course in motivation is in order.
The old saying goes, “To a person whose only tool is a hammer, every problem looks like a nail.” The first time I quoted that saying was when I started writing this column back in 1994. The whole purpose of the column is to provide you with a toolbox loaded with as many embedded programming tools as possible. That way, when faced with a new problem, you'll be able to whip out the tool that works best.
Lately though, I've been on the other side of that coin. It seems that all my problems really are nails-nails of a very specific sort, namely, nails that respond to minimization. The problem that really motivated this series was one I ran across about three years ago, and it was indeed a real-time problem. I'll change a few names to protect the innocent (mainly me). I'll also simplify a bit.
The challenge is to identify both the type and concentration of N compounds in a solution, based on a spectrographic analysis. The approach was to pass white light through the solution, and try to identify the spectrographic signatures of each compound. Conceptually, the problem is the same one chemists solve using a mass spectrometer.
Naturally, the project being an embedded system, we had the usual gang of mutually exclusive goals: the system had to be small, cheap, fast, and finished yesterday. The concentrations were not constant-they varied with time-and the response of the system had to be fast-much too fast for the usual approaches. We couldn't afford either the time or equipment to do a full spectrographic sweep. Instead, we had to use a small number, M, of individual sensors, each tuned to a specific wavelength.
As my favorite professor used to say, “conceptually, there's no problem.” If
and the signatures of the unknowns are unique, the problem is solvable. In a perfect world-one in which the sensors are perfectly linear and exact, the effect of increasing concentration is also linear, and the law of superposition holds-the problem becomes a simple case of linear algebra:
where x is an N-vector of concentrations and y the M-vector of sensor outputs. In practice, none of the above was true. The sensors, being cheap, were not only rather inaccurate, but also temperature dependent. Typically, concentrations affect transmission logarithmically, not linearly, and the signatures were not as unique as we might have wished.
Even though we tried to linearize what we could, our best model could still give us only a highly nonlinear relationship:
Our goal, then, was to find the value of x that somehow best fit the observations. The approach is straightforward enough: let be the best estimate of x, and:
We're looking for the value that minimizes the mean-squared error in y:
In short, our problem was to do a multivariate minimization of a function that was not only non-linear, but also changing with time. And we had to do it within the usual real-time restrictions: minimal CPU time and minimal RAM. To do so took all the ingenuity we could muster, and then some, but that's neither here nor there. The point is that the problem was, at its core, a problem in minimization of a nonlinear function of several variables.
A second problem wasn't real-time, but its intended end use was. In my book, Software Toolbox for Embedded Systems Programming, I give polynomial and/or rational polynomial expressions for the various trig and logarithmic functions: sine, cosine, arctangent, log, and exp.
Every handbook gives the power series formulas for these functions. However, evaluating them verbatim, without some careful tweaking, is not a good idea. These power series are infinite series. In the real world, we have to make do with truncated versions. The sine series, for example, is:
To evaluate the series, we must lop it off at some point and agree that we have enough terms to give us the accuracy we need. However, if we plot the error curve of the truncated series, we will find that the largest error occurs at the end, when x is largest. Elsewhere, it's essentially zero.
If we have time and computing power to burn, we can simply use enough terms to keep that end-point error within an acceptable limit and move on. If, however, we tweak the coefficients a bit, we can find an error curve that reduces the end-point error dramatically, at the expense of larger errors elsewhere in the range. A typical, improved error curve oscillates about zero-the more cycles, the better. In general, the peaks of these oscillations will have different amplitudes. However, we can always tweak the coefficients to reduce the highest peak, at the expense of increases in others. A little reflection will convince you that the optimal condition occurs when all the peaks have equal amplitude. With this condition, the height of the worst peak is minimized. This is called, for obvious reasons, the “minimax” solution. Using a minimaxed version of the sine series reduces the worst-case error dramatically, to the extent that we can usually drop at least one term of the series. That means faster computation.
For power series, a well-established method exists for generating the optimal coefficients. The method involves use of the Chebyshev polynomials, which I'll be writing up in an article Real Soon Now.
Unfortunately, things don't always go so well. The power series for the arctangent and log functions converge too slowly to be useful. To get a useful form, we must resort to a continued fraction. Truncating the continued fraction leads to a ratio of polynomials, such as:
where coefficients a through g must be determined. As in the case of the power series, we can come up with analytical values that the coefficients should have, if the polynomials are really infinite series. But when we truncate, these coefficients are no longer optimal. As before, we need to adjust them to minimize the absolute value of the error. The Chebyshev method, which only works for power series, will not help us here; we are back to minimizing a nonlinear function of several scalar coefficients. It's the same problem as before, in a different guise.
Finally, consider the case of tracking a satellite in orbit. At any instant of time, it has a state x that describes its position and velocity. We can make periodic measurements of that state (or, to be more precise, we can measure observables y which are related to x by some relationship like Equation 3). These measurements will, in general, be corrupted by noise, systematic and random errors, and so on. Our job is to make our best guess as to the true path of the satellite.
This is the orbit determination problem. We define a vector of errors just as in Equation 4, and seek to minimize their mean-squared error, as in Equation 5.
In the case of a satellite, we have a model: a set of equations of motion that define the expected path. The solutions to these equations end up depending on certain state parameters; the position and velocity at some epoch time are typical. We call this collection of parameters the state vector, x, which we again try to estimate.
Most of us are familiar with the least-squares fit of noisy data. That's the thing that Microsoft Excel does when you ask for a linear regression. The method is fairly straightforward when the model is predicted by a polynomial. It's anything but straightforward when it's a set of nonlinear differential equations. The problem buried beneath the surface of the orbit determination problem is-you guessed it-nonlinear, multivariate minimization.
But, you ask, is this really a valid problem for an embedded system? I mean, after all, isn't tracking a satellite the job of a ground tracking station? Yes, it is, but what if the spacecraft itself is doing the tracking? Then it must do so within the onboard computer, however large or small it is. The first time this was done in earnest was during Project Apollo, where the astronauts could have used it to navigate, had they lost radio contact with the ground.
The last time? Think of virtually every airplane, every guided missile, every tank, every ship, almost every military vehicle that moves. Think, even, of the GPS system in your family car or boat. In those applications, a recursive form is used that we call the Kalman filter. I'm working on an article for that, also.
The Kalman filter is the best we can do if we must process the data in real time, and use it to steer a missile or airplane. But it's only optimal if the problem is a linear one, which is usually not the case. If we have the luxury of being able to process recorded data after the fact, a batch least-squares fit is more accurate. As it happens, I'll be doing that very thing pretty soon. So, you see, this particular nail seems to be following me around. I can't escape it.
Tubes, tubes, tubes
From the kind of articles I write, you might think that my first love is mathematics. You'd be wrong. Aside from a young lady named Jane Burgess, my original first love was electronics. Specifically, audio electronics using vacuum tubes.
What's that you say? You've never heard of a vacuum tube? It's a gadget sort of like a transistor, only bigger. And sort of like the picture tube in your TV set, only smaller. Instead of electrons, holes, and energy bands, the tube depends on a phenomenon called thermionic emission, where electrons boil off of a heated cathode and flow to a collecting plate. For more data, see the movie Frequency.
I saw my first vacuum tubes when I looked into the back of the radio consoles my parents and grandparents listened to. I saw a much bigger one in the back of a diesel locomotive, the three-foot-diameter, spherical tube called a thyratron rectifier. Amazingly enough, it was an invisible stream of electrons flowing through a glass bottle that made that huge train go.
In the early '50s, a phenomenon came along called high fidelity-hi-fi for short. Some folks had realized that the sound coming out of their RCA Golden Throat consoles wasn't really all that golden. It sounded nothing like, for example, the sound radiating from behind the screen at our local movie theaters. We wanted more, and hi-fi gave it to us. Some folks were converting PA amplifiers and theater speakers to home use. Others were building their own gear. Yet more were buying expensive, consumer hi-fi gear that was just coming onto the market. Names like Bogen, Bell, and McIntosh come to mind. They all used vacuum tubes for the simple reason that solid-state versions didn't exist yet.
During those years, one name rose to prominence above all others: Heathkit. By selling the amplifiers in kit form, the Heath Company saved the cost of fabrication, which was, for those hand-wired components, considerable. As a result, they could offer amplifiers for $40 to $60 that would put to shame other models costing hundreds. For anyone who knew how to solder, building a Heathkit was the only way to fly. In the early '50s, I reckon there wasn't a college fraternity in the country whose dances weren't powered by Heathkits.
Heathkits didn't just enable sock-hops, either. They-and vacuum-tube gear in general-were also all over the research labs. Remember, this was the '50s. If you wanted a mass spectrometer (itself a vacuum tube of sorts), you didn't just open a catalog and order one. You built it. Then you built the power supply, the controller for the vacuum pump, the vacuum gauge, the current detector, the strip-chart recorder-the whole nine yards. Physics labs of the day were 20% pure research, but the other 80% was plain old custom electronics. The physics department at my university must have had, easily, 300 Heathkits, not to mention all the custom stuff. One of my classmates had so many tubes in his lab they had to pipe in extra air conditioning to keep them from blowing.
I had a lot of synergism going. At my classes and labs, I was taking courses in electronics, learning all about negative feedback and op amps (oh, and you thought Fairchild invented them?), and designing and building gadgets based on vacuum tubes. At home, I'd turn on my Heathkit amp, put on a record, and sit back to enjoy the cannons of Tchaikovky's “1812 Overture,” pumped out of a 15-inch woofer. It was only natural that my two lives would merge, and soon I was designing and building my own gear.
I had made the build/buy decision, and decided that building was a lot more fun. So instead of yet more exotic amplifiers, I bought test equipment to allow me to build my own. The test equipment was-you guessed it-all Heathkits.
Fast forward to the year 2000. I sit in my living room, soaking up the sound of five channels of solid-state, digitally equalized sound. The sound source is a CD disc, a laserdisc, or my also-digital satellite TV. But somehow my thoughts keep drifting back to “1812 Overture,” and how clean and pure it sounded. I pick up a magazine called Glass Audio, and I discover, much to my amazement, that many people have never left vacuum-tube systems. There are those who think that the tube amplifiers sound better, and always will.
I talked to some friends on the Internet. One of them mentioned used items on eBay. My curiosity piqued, I wandered over and did a search on “Heathkit.” What to my wondering eyes did appear, but no less than 500 Heathkits for auction. I was hooked, big time. I still am. I'm also thousands of dollars poorer, but happier nonetheless.
You might recall that the solid-state revolution in audio was fueled by systems from companies like Pioneer, Sony, Yamaha, Panasonic-all Japanese companies. Ironically, the Japanese themselves seemed to always prefer tube sound. When we were all trading our tube gear in on solid-state stuff, some savvy traders were quietly shipping all the surplus tube gear to Japan, where it brought big bucks. Now it's finding its way back to our shores. People are also building new, modernized tube stuff again. You won't believe the prices! $1,000 for the bottom end gear; $10,000 to $40,000 for the high end.
My first purchase was a superb Heathkit amplifier, the W-5M, which was like the one I used to have and thought I'd never see again. Then I got a few more for good measure (hey, for a conversation piece, you can't beat a home theater system with more than 30 tubes, all glowing merrily). Then I realized that I needed a way to maintain them. After all, you can't just go down to your local drugstore anymore to test the tubes. So I bought a tube tester. And a VTVM (that's vacuum-tube voltmeter to the uninitiated). And distortion meters. And audio voltmeters.
Then I revisited the build/buy question and reached the same decision as last time: building is even more fun than buying and this time I'm one of the few that remembers how. But I needed more gear, such as audio oscillators and regulated power supplies (try to find a new 500V supply these days). As I type these words, I sit surrounded by 100-plus vacuum-tube gadgets, most of them Heathkits. I'm in second-childhood heaven.
One thing has changed since my last tour through hi-fi paradise: Spice simulators. When I was studying electronics the first time, the only way to figure out the gain of a tube circuit was to analyze it mathematically. I thought it was great fun, and was soon seeking out new circuits to analyze. I'd invert 10-by-10 matrices analytically, by hand, and call it recreation.
It may still be fun, but it's no longer necessary. Today, you can buy computer programs that will simulate both the small-signal and large-signal behavior of a circuit. Just as we now simulate aircraft before we build them, we can also simulate electronic circuits.
Circuit simulation programs have been around for quite a while. Their purpose is, of course, to predict the behavior of a circuit without the need to actually build it. The early ones could only handle linear devices like ideal, passive components. The concept only became useful, however, when they could simulate active, nonlinear components. They became virtually indispensible when ICs contained several hundred or more transistors. Even if one could build a breadboard with so many components, it wouldn't behave anything like the real IC. Spice is an acronym for Simulation Program with IC Emphasis. In recent years, the folks at UC Berkeley developed BSpice and released it to the public domain. BSpice itself is purely a text-based simulator, intended for use on mainframes. PSpice was developed for microcomputers, but today's micros are as powerful as yesterday's mainframes, so most modern simulations are based on BSpice.
Recently, several vendors have come out with Windows-based versions of BSpice, with GUI front ends grafted onto the BSpice internals. Prices run from outrageous to free, with performances roughly commensurate with price. Thanks to such tools, it's now possible to build and test a serious circuit, in the comfort of one's own PC. I do enjoy soldering circuits together, but tweaking one for best performance is not an exercise for the meek, and it can take forever. A simulation program brings to the electronics arena the same advantages that flight simulators bring to airframe design. At least one of the simulators will actually optimize parameters like resistor values for you.
Now we get to the good guy part. I've recently bought two simulators I can heartily recommend. One is the tongue-twisting B3Spice A/D 2000, from Beige Bag Software. The other is Tina Pro, from Designsoft. Both come with vacuum-tube models, and both allow you to add your own. Price runs around $300 for each, which is very low for the genre.
GUI-based simulators have not been around long enough for the interfaces to become standardized. When using them, be prepared for things to work differently than you would have expected. Despite their common background, each of the two programs I mentioned have very different user interfaces, and they each have strengths and weaknesses. Tina is very handy for adjusting parameters-it's the one that will optimize component values for you-but it's not so strong in generating large-signal waveforms. B3 Spice A/D 2000 does that nicely, but is harder to use.
I should mention a third choice, Electronics Workbench. It used to be almost a mainstay for many electronics experimenters, but I haven't heard much from them lately. The rights to EW were bought recently by a British company. I used EW for some time, and it was by far the easiest of all to use. However, I found my version to be quite flaky.
When you're designing a tube amplifier stage, you start with the characteristic curves for that vacuum tube, and choose a voltage level, load resistor, and bias that keeps the tube operating in the most linear part of its range. These settings comprise the operating point for that tube. Once the load resistor and so on have been pinned down, you can draw a transfer curve, which is the graphic equivalent of a control systems transfer function. If you've done things properly, the curve should be nearly straight.
However, when the amplifier is supposed to have, say, 0.1% distortion, “nearly straight” is not nearly good enough. For the final tweaking, you must play around with the values to make the curve as close to straight as possible. Here is where the simulation comes in, because measuring distortion as one changes resistor values can take forever, if you have many things to adjust. Ideally, a simulation should be able to optimize values for, say, minimum distortion at a specified signal level. We're not quite there yet, but getting there.
I've been working with B3Spice to get a semi-automated optimization. What I do is take the math model on which the simulated device is based and generate a mathematical expression for the transfer curve. From this data, I generate a number of points on the curve and use them to fit a polynomial. From the polynomial, I can calculate harmonic distortion as a function of operating point settings and signal level. Then I can adjust the settings for minimum distortion.
The solution turns out to be (groan! There's that nail again!) yet another job for a multivariate, nonlinear optimizer. I can't escape it, I tell you. The problem is following me around.
Back to the minimizer
By now, I hope I've both piqued your interest in tube electronics, and given you, once again, the motivation behind my search for the perfect minimizer for nonlinear functions. Now that we've had our fun, let's get back to work. We've been seeking, as if you didn't know, a C function that finds the minimum of some arbitrary, nonlinear function:
The minimizer is intended to use a combination of linear and superlinear methods. That is, it should combine the reliable but slow bisection method with the fast but tricky method of parabolic fit. “Conceptually, there's no problem.” In practice, the trick is to know when to apply which method, and also when convergence has been achieved. Deciding which points to retain and which ones to throw out for the next iteration entails a lot of risk. My early attempts retained only the three points nearest the minimum, such that:
More recently, I've decided that we gain robustness by retaining more points. That robustness, however, comes at the price of a much more complicated process of maintaining the array of points.
With respect to the development of our minimizer, I think I owe you a brief explanation. In past issues, you've watched me try different approaches, different software architectures, and different conventions for parameter passing. Some of the approaches I've tried go against the grain. I'm the founder, president, and chief spokesman for the Society to Eliminate Global Variables (SEGV). Yet in my last column, I gave you a design that had nothing but. You may be wondering what's going on. I'll explain.
Sometimes I can build elegant software. I like to think that most of my software is clean and straightforward. Sometimes I can build software quickly. But I've never claimed that I can build clean, straightforward, and elegant software quickly. The reason is simple enough: I like to take a lot of time in the front end, thinking about the design. I weigh as many alternatives as I can, and tend not to move on until I think I've made the right choice.
In a way, I'm my own optimizer. I want to build not only a good piece of software, but software that represents the best choice of architectures and designs. Usually, this approach works just fine. It fails in the same place an optimizer does: when there are multiple choices with no clear-cut advantage for any one of them. In situations like that, I tend to behave in the same way a poorly designed optimizer might: I get hung up in a for-loop, endlessly bouncing from alternative to alternative.
My kids used to drive Quarter-Midget race cars, and I did the maintenance on them (and here you thought I was not a hardware person). I sometimes made rather sweeping changes in the design, such as replacing a nearly-rigid front end with one with complex geometry, copied from a Formula 1 car. Typically, one is faced with alternative methods, each with advantages and disadvantages, and each with various challenges of implementation. When faced with such a situation, I'd tend to ponder the options endlessly.
My ex-wife, bless her heart, knew this, and also had a very effective way of dealing with it. One weekend I was sitting on the stoop of our garage, pondering just such a problem. I was actually holding parts in each hand, and trying to decide which one to use, and how to complete the process. I must have sat there for two hours, nothing moving but my head, as I stared back and forth at those two parts, hoping to think of an easier way to use them.
Suddenly, the door behind me flew open. My wife shouted, “Do something, even if it's wrong!” and slammed the door again. That worked. It was enough to jog me off dead center, and force me to make the choice I needed to make.
That's what was needed: a swift kick. I need one here, also. I don't have that ex-wife to kick me around anymore, and Tim, my editor, is far too nice to do so. So with your permission, I'm going to give myself a self-induced mental kick (thanks, but no help needed), and we'll move on.
I'm now convinced that our state machine architecture is the right one for this job. I'm pretty sure we need to maintain five points, not three, if for no other reason than to handle the case of an absolute-value sort of function; the kind with a sharp minimum and discontinuous derivative. I'm thoroughly convinced that the end criteria should include variations in y as well as x.
I'm less enamored of the all-global-variable decision I used last time. As a spokesperson for SEGV, I wish I hadn't done that. But I do think the five data points definitely deserve being global. In fact, they really deserve to be part of a C++ object, but I'm reluctant to use that approach since so many of you are still using C. Also, the program necessarily needs intimate access to more internal parts of the object than conventional object-oriented design would approve.
So I'll begin this month by building a C equivalent of a C++ object, complete with “methods” to operate on it. The two key methods involve adding new points to the structure. Sometimes, the new point will be a new minimum. Sometimes, it is merely a new value near the minimum. Either way, it will displace points that are further from the minimum. Sometimes, the point displaced is the leftmost one. Sometimes, it's the right. So I'm building two new functions, called insert_left() and insert_right(). The structure and its related functions are shown in Listing 1. Please note the use of assertions to make sure that the functions are called properly. In production use, an invalid call should never happen. But then, Windows should never crash, either. An assertion is a cheap way of protecting oneself from bugs.
The two functions must adjust the value of the variable, npoints, when they are used. Normally, insert_left() shouldn't alter the number of points, except when the array was originally empty. insert_right() shifts points to the right, so increases the number unless the array is already full.
I like this new structure. I like it a lot. It gives us ready access to the items in the data array, while still allowing a modicum of protection for them. Notice, in particular, that I don't need to pass the values of the points through calling lists. At most, I only need to pass indices.
When we're stepping through points to find the initial minimum, we need a function that will append to the current list, shifting left if necessary. This function is just enough different from the other two to warrant the writing of a third function, which I've called append(). It's also shown in Listing 1.
As long as we're dealing with this array, we might as well also write the functions that perform computations using it. Listing 2 shows the functions bisect_left() and bisect_right(). As the names imply, the first method bisects the interval to the left of the specified point; the second bisects the one to the right. These functions are so trivial that one could argue they should be implemented inline, but that's an optimization I'm going to pass on for now. Similarly, we could clearly use a single function, rather than two, to bisect an interval. But keeping two functions makes things more clear and, after all, it's not as though a megabyte of code is involved.
Note that each function only returns the corresponding value of x. The code to compute the corresponding y-value, as well as the logic to use the new point, will be done elsewhere. In general, it's always a good idea to separate computation from logic.
For the parabolic fit, things work a little differently. First, it must do two things, not one. In addition to estimating the x-location of the minimum, I want it to predict the value of y for that x. That y-value depends on the quadratic coefficients for the parabola, so it's best computed inside function para_fit(). Therefore para_fit() returns two variables, not just one.
Second, while we normally will be fitting three adjacent points, I want to retain the option to fit points 0, 2, and 4, when conditions are right. This gives us yet another double-check on the validity of the minimum.
To permit this, I've defined two versions of para_fit(). The normal one does a parabolic fit of the three points centered at index k. The second, para_wide(), always uses the three points x0, x2, and x4. As with the bisections, the two functions share much common code, but they're not long enough to justify combining them.
These functions are shown in Listing 3. Note that I've found an even simpler form for the estimated minimum value, ymin . The new equations are: