Curmudgeon Repercussions
Faced with the unacceptable alternative of spending 42 hours calibrating each sensor, Crenshaw looks for a mathematical solution.
Boy, did I ever strike a responsive chord with the Im mad as hell line! My e-mail cup runneth over. I found that an awful lot of other folks share the sentiment. Not one opposed it. We may be poised on the verge
of the next revolution in personal computers, led by the People Who Arent Going to Take It Anymore. Some of the messages I received were extremely insightful and thought-provoking. One, in particular, inspired me to wonder just how many person-hours, and therefore dollars, are being wasted every day by people who set out to do a job, but ended up mired down in a battle with intransigent application software. As the old saying goes, When youre up to your hips in alligators, its hard to
remember that you came to drain the swamp. I find myself wondering just how many deserving swamps remain undrained.
Case in point: last night I had to generate a graph using Microsoft Excel. Now, as any spreadsheet user knows, you generally draw a simple graph by generating two columns of numbers, representing the
x
- and
y
-axes of the graph. You define the values in the
x
-axis column using a formula something like:
<A4> = A3 +
0.01
You define the functional relationship between
x
and
y
by a formula in cell B3, then duplicate it into cell B4. Then, to generate the column of numbers, you must copy cells A4:B4 downwards as far as you want the
x
-value to extend, say into cell A503, for the range:
x
= 0..5
The way you copy cells, of course, is by first pressing Ctrl-C to copy them to the paste buffer. Then you highlight the
region to copy to, by dragging the mouse across it, and finally, you press Ctrl-V to paste.
Theres only one small problem: unless your CRT has a lot more resolution than mine, you cant highlight 500 rows of a spreadsheet all at oncethey dont fit on the screen. So instead, I must drag the mouse over the 500 lines, letting Excel scroll the screen at whatever rate I can live with (careful not to overshoot), until the row count gets to 503.
Until recently, I used Borlands
Quattro Pro rather than Excel (I switched for compatibility reasons). Quattro Pro required a similar approach, but with one very important difference: it had a command called something like Select Block, in which you could drag the mouse over a region, and the range would appear in a box. The neat part was, having selected the range with the mouse, you could then go in and edit the last field to extend it.
For example, I could highlight the range A4:B10. Then I could manually change the 10 to a
503, press a key, and bingo! There were my columns, neatly fleshed out.
Last night, as I was more or less patiently scrolling through 500 lines in Excel, overshooting, scrolling back, and so on and so forth (why does it always seem to scroll faster backwards, after an overshoot?), I found myself thinking, I wonder how many people, all over the world, at that very instant, were also scrolling screens in Excel, to do the very same thing. How many years has Excel been working this way, and nobody ever
thought to add the Quattro Pro Select Block feature?
In future issues, Ill be sharing some of the more insightful e-mail on this topic with you. For now, I just want to wait, see what others have to say, and let my thoughts gel. I do perceive, though, that the feelings run a lot deeper than Id thought.
Last month I set out to tell you about the idea of using distribution functions to simulate input pulses. The central idea was one thats pretty obvious, in retrospect, and
one that Ive put to use in other areas many times. That idea is: just because a certain function first appeared in studies of probability distributions doesnt mean that this is the only use to which one can legally apply the function. The functions describing the conic sections (which include the circle, ellipse, parabola, and hyperbola) originally came from geometry. Theyre called the conic sections because they are the curves generated by the intersection of a plane and a cone.
The most intensive use of the properties of these curves, however, turned out to be in quite a different area than simple plane geometry: we used them to go to the Moon.
Purely by chance, the same equations that express the mathematical properties of the conic sections just happen to also represent the solution to the two-body problem of celestial mechanics, and therefore describe the orbits of
all
celestial bodies in which the influence of one attracting mass dominates all others. Although this
is never really the case in reality, its a good enough approximation for almost every real body in our solar system, and most others as well; planets, comets, and moons all follow orbits that are almost perfect conic sections.
The key message, then, is simply that one neednt feel constrained to use functions only for the problems they were originally intended to solve. We are free to use whatever function happens to work. This idea seems so obvious that youre probably wondering why I
even brought it up. The reason is simple enough: although I fully understood and have applied the idea many times before, the specific notion of using probability functions for something other than describing statistics is one that simply hadnt crossed my mind before.
Still, I dont feel moved to belabor the point. Having finally, after all these years, had the light bulb labelled idea go off above my head, I thought the notion worth sharing. Now Ive done that. Theres
no point going on as though its the last idea Ill ever have. File the idea away somewhere safe, and dredge it out the next time youre trying to fit data. Meanwhile, well move on to other things.
New beginnings
What shall we move on to? Well, I have some rather specific things in mind, for both the column and an upcoming set of articles. These things are fairly extensive, and will occupy our time for the next few
months.
Over the years, Ive given you what I think are some very nice, general-purpose algorithms and software modules for solving various practical problems, such as an algorithm for finding the root of a
function,
1
a general-purpose
Runge-Kutta integrator,
2,3
an algorithm for
least-square fits,
4
and so on. Its been awhile since I did one of these algorithm orgies, and
I find myself building up to another one. To spare you the suspense, for my next trick, I intend to develop a world-class, general-purpose software tool for finding the minimum of a function.
This specific goal, however, is not even half the story. To explain why Im interested in this particular problem-solver, I need to give you the background behind my interest, and the motivation for it. I cant think of any way to give you this in a few sentences, so please bear with me as I give you the
motivation in its long form. Recognize that, just as with the probability distribution functions, theres no law that says that we must always use the final product for the same reasons Im interested in it. You are free to use it wherever it fits the job. However, I think when you see the background details, however convoluted the reasoning, youll see why this particular tool becomes central to the solution to many, many problems.
The CO
sensor
Im sure youve noticed that lately I seem to be concentrating on the notion of fitting equations through tabulated data. That obsession is no accident; its what has been occupying a considerable percentage of my time for the last three years. I cant tell you the details of that workit was proprietarybut I can give you a practical example that illustrates both the problem and its solutions quite nicely.
Some years ago, I bought into a project that
looked like a sure winner. A friend had identified an important need: many people die every year because they use open kerosene heaters or gas heaters in their mobile homes and RVs. The heaters release carbon monoxide (CO) into the closed space, and the people die without ever being aware theyre in trouble. My friend saw a need for a warning sensor that would detect CO and sound an audible warning, in a fashion similar to a smoke detector. Whats more, he managed to locate a chemical cell that
would generate an electrical voltage when CO was present.
We bought one of the sensors and proceeded to test it to determine its response to various levels of CO. We learned, not surprisingly, that the response was anything but linear. Nevertheless, we ended up with a calibration curve for the sensor, and reasoned that as long as we knew what the response curve was, we could compensate for the nonlinearities.
Then we learned that the sensor wasnt sensitive to just CO. It also depended on the
temperature and humidity of the air, and the temperature of the sensor itself. No problem, we thought; we can add thermistors and a hygrometer to measure temperature and humidity (though the latter is not easy to find, and hygrometers themselves depend on temperature). Still, the sensor was so small and cheap, it seemed a good idea to make it work, despite its many dependencies.
Our next surprise was when we got a second sensor, and discovered that it didnt behave like the first one. Each sensor had
a somewhat different set of characteristic curves.
Calibration
Faced with such a situation, what do you do? One approach is simply to go to another source, for a sensor thats more linear and has better quality control. But if the sensor at hand is cheap and reliable enough, its still tempting to try to figure out a way to use it.
Recently, I worked with another sensor of a different sort, that had a similar story. It was
small, cheap, low-power, extremely reliable, and dead-nuts accurate once it was calibratedall the characteristics one could ask for in a sensor. It also had a highly nonlinear response, was sensitive to other factors like temperature, and the response varied wildly from unit to unit. We desperately wanted to use the sensor, but we had to make it work in volume production. The only solution was to calibrate each sensor, storing its particular response curve. The solution we chose was to have the vendor
store the calibration constants in an EPROM that came mounted on the same board as the sensor itself. The software could then read the EPROM, set up its compensation algorithms, and linearize the response.
The only problem with this approach is a practical one: it takes time and resources to calibrate a sensor. The more measurements you have to take, the longer the calibration is going to take. If youre not careful, the low cost of the sensor itself can easily be eaten up by the cost of the labor
required to calibrate it, not to mention the cost of the software to compensate it.
Given a certain calibration curve, we can always fit it with a polynomial of high enough order. Alternatively, we could use a lookup table with linear interpolation, or a spline curve fit through fewer points. The problem with such an approach is that it takes quite a number of data points to get a good approximation to the curve. Now, how do you go about getting the data points?
In the case of that old CO sensor, we
can conceive of some kind of test rig that flows air, along with measured concentrations of CO, through the sensor, and reads the output voltage. But if I need, say, a tenth-order polynomial to fit its response curve, Im going to have to compute 11 coefficients. Eleven coefficients means
at least
11 values of the CO concentration. Characterizing the sensor with a lookup table would require many more than 11 concentrations.
But thats not all. Remember that I said the sensor was also
sensitive to temperature and humidity. That means that every run I make with different concentrations must now be repeated, both for different temperatures and different humidities. Suddenly, Im beginning to envision a test rig that must be automated, capable of supplying many different concentrations of gas, at controlled values of temperature and humidity. Just providing the lattercontrolled humidityis a trick all by itself. The cost of the test rig to let me use my ultra-cheap sensor is
suddenly beginning to approach that of a launch facility for the Space Shuttle!
Even thats not the worst of it. The worst of it has to do with the dimensional explosion.
Suppose I can get by with, say, 10 values of CO for each calibration curve. But if the sensor is also dependent on temperature, I need to measure it at 10 sensor temperatures. Also 10 air temperatures, and 10 values of humidity. Now I need not 40 data points, but 10,000! Since this is a gas-driven, chemically-activated sensor,
were going to have to give it some time to get a steady flow, and stabilize its output. Figure in at least 15 seconds for that, and weve got 150,000 seconds, or about 42 hours of calibration,
per sensor
. Now we have at least two new problems. First, assuming that a human has to be around to monitor the calibration process, were talking on the order of $600 to $1,000 per sensor, just in labor costs. Second, unless were willing to build multiple calibration systems, we can only
produce, at most, 17 sensors per month, even if were running the calibrator over three shifts. Add to this the cost of keeping a plant open and running for those three shifts, and suddenly our cheap sensor doesnt seem cheap at all.
How can we get around these problems? The answer is simple enough to conceive, not so easy to realize. We have to find a way to fit the characteristic curves with fewer coefficients, which means fewer data points to measure. Imagine, for example, that we are so
astonishingly lucky that the calibration curve turns out to be a pure exponential:
Now, instead of 10 or 11 coefficients, we need only two. Whats more, we might even discover that one of the coefficients, say
k
, is the same for all sensors. Suddenly, our intractable problem becomes tractable again, and we can announce that the plant will no longer be running three shifts.
Fitting a curve like that in Equation 2 to data, however, is a lot
different than simply recording readings for a lookup table. In general, we can expect to have to use some trial and error, especially in the research phase, to discover that Equation 2 does indeed describe the sensor output. Though we might be able to, as we hope, use only two points to determine the two coefficients during calibration, we will almost surely need many more to verify that Equation 2 really does fit all the sensors to enough accuracy to be useful.
This is where the curve fitting comes in. We
can begin with a characteristic curve, taken by whatever painful and time-consuming process we may need, during the R&D effort. What we seek to do is to find a function that fits this data well, and also has as few coefficients as possible. The obvious criterion for good fit is the least-squares criterion. This criterion requires that we choose a function such that the sum of the squares of the error at each data point is minimized.
Now, if the curve to be fitted is a polynomial,
standard linear regression techniques will find the coefficients. But remember, polynomials take too many coefficients. Were not seeking a polynomial solution, because we cant afford the large number of coefficients its likely to require. Were seeking a solution using a nonlinear function like Equation 2, and that requires a nonlinear least squares fit. And that means minimizing a nonlinear function. In fact, it means minimizing a function of several variablesthe coefficients.
Regular readers will recall my mention of a nice software package called MicroMath Scientist. This is precisely the kind of problem Scientist was developed to solve. Given any nonlinear relationship, Scientist finds the values of the unknown coefficients that minimize the least-square error. How does it do it? By the same methods that were going to be studying here. I dont know what else MicroMath Scientist does, and I cant tell you which algorithms it uses, but I can be 100% certain that
at its heart beats a general-purpose function minimizer.
Incidentally, a reader told me that hes been unable to contact MicroMath. I sure hope the company hasnt closed up shop. Our industry has already lost too many Good Guys; we dont need to lose any more. If anyone out there knows if MicroMath is still in business and how to contact them, please let me know via e-mail, and Ill relay the info to the rest. Also, if anyone knows of other sources for nonlinear equation solvers,
Ill be happy to add them to our list of resources.
Minimizing in real time
Is there ever an occasion when we need to run a function minimizer in a real-time embedded system? You bet there is, and Ive also wrestled mightily with this problem over the last year or two. The problem rears its ugly head when we go beyond single-input, single-output systems.
To keep the discussion strictly generic, imagine that weve got a system
that can take five different inputs, and return five different outputs. There is cross-talk, however, between channels. Each output channel, in general, responds to all five inputs.
Let the inputs to the system be denoted by the vector
x
, and the outputs,
y
. If the system were linear, we could write the dependencies of
y
upon
x
in matrix notation:
y
=
Ax
Inverting the matrix, then, gives:
y
=
A
-1
y
If the response is truly linear, the matrix
A
is a constant (possibly different for each sensor). That means its inverse is also a constant, so we can compute the inverse off line, as part of the calibration, and store the inverse in EPROM. Then, given a set of output voltages,
y
, we can compute the combination of inputs,
x
, that produce them.
Unfortunately, weve already agreed that
the characteristic curves for our sensor arent likely to be linear, so we cant write down Equation 3, much less Equation 4. The best we can do is write:
y
=
f
(
x
)
But the relationship implied by
f
(
x
) is sure to involve some constants, such as
A
and
k
in Equation 2. To make that involvement explicit, well write:
y
=
f
(
x
,
a
)
where
a
is a vector of some undefined number of coefficients. To make use of our sensor, we must face two challenges. First, during calibration, we must determine both the nature of
f
(
x
,
a
), and the numeric values of
a
that best fit the data. Thats a job that calls for a nonlinear curve fit.
Second, to support the real-time function, we must find a way to revert Equation 6 to determine
x
, given
y
. If were
clever enough, and if the math is tractable, and if were very, very fortunate, well be able to revert the equation in closed form. In that case, we only need to program up the reverted equation, and compute it in real time.
Murphys law being what it is, however, we expect the likelihood of getting a closed-form reversion to be slim to none. We can be virtually certain that, whatever the form of Equation 6, were going to have to revert it in real time.
How do we do that? We can start
by assuming a solution, call it . We then calculate the corresponding value of

We now compute the error vector:

Our goal, then, is to drive the value of
e
to zero by somehow adjusting our guess, . A Newton-Raphson ap-proach sounds like a reasonable one.
In this example, Ive assumed that we have an equal number of inputs and outputs; that is, the vectors
x
and
y
have the same
dimensions. In that case, we can expect that, unless the function
f
(
x
,
a
) is really crazy, we can always find a value of which forces
e
to zero. However, in the real problem which I worked, we actually had more outputs than inputs; the sensor was a multi-channel sensor, sort of like the spectrum analyzer of an audio system. In this case, we wouldnt be able to invert the matrix
A
, even if the problem
were
linear. Which it is not.
For such a situation, we are
unlikely to be able to match all the output voltages,
y
. Our model, which is at best an approximation to the real world, is bound to be somewhat in error, which means that the value of generated by Equation 7 is never going to match, exactly and element-by-element, the measured value of
y
. The best we can hope for is to minimize the error. This means that our criterion for best fit becomes another least-square criterion:

This one we must solve
in real time, so clearly were going to want a very efficient minimizer.
Minimax
In an upcoming article, Im going to be talking about ways to obtain the best fit for functions of a general nature. As it turns out, the least-squares criterion is not always best. Those of you familiar with the approximation of trig functions know that we can improve the accuracy of power-series approximations using Chebyshev polynomials. Im
long overdue for an article describing how to use these functions. I wont attempt to cover them here, but suffice it to say that the Chebyshev polynomials derive from a different criterion than least squares. Instead of trying to minimize some bulk error like mean-square error, over the entire range of a function, we require that the error be bounded by the smallest possible bound.
You can see why this is a better criterion, for computer-generated functional approximations, than the least-square
criterion. With the latter criterion, we can conceive of some pathological case in which the error is essentially zero in all but a few isolated points, where it is intolerable. The Chebyshev functions are based on the notion that the value of the
largest
error is minimized. This is the so-called minimax criterion. In practice, since the error for a polynomial fit is zero at some points (namely, the points used to fit the data), we expect to see it oscillate, positive and negative, between these points.
If any peak, in either direction, is dominant, we can imagine reducing the maximum error by adjusting that peak downward, at the expense of raising other peaks. The curve fit is optimized, and the minimax condition is met, when all the error peaks have equal amplitudes.
If were dealing with polynomials, we have no need of iterative solutions; we can guarantee a minimax solution by proper application of the Chebyshev polynomials. However, polynomials are not the only functions useful in curve
fitting. One extremely useful function that is not a polynomial is the rational polynomial form:

where
P
(
x
) and
Q
(
x
) are themselves polynomials. For this important case, the Chebyshev polynomials cant help us; we must find the minimax solution the hard way, by trial and error.
Are we motivated yet?
Well, it took a long time to describe, but now you can see
the motivations behind my interest in general-purpose minimizers. To make our multivariate sensor work, we needed efficient minimization algorithms, for both the calibration and real-time operation of the system. To study various functions and their suitability to fit the observed data, I also needed a nonlinear least-square tool to support my analysis. To get the optimal minimax solution for functions other than power series, I also need a minimizer. It almost seems that every problem Ive worked on for
the last three years boils down to the same problem, at its deepest level.
An old truism, and one of my favorites, goes: To a man whose only tool is a hammer, every problem looks like a nail. In my case, I find myself in the peculiar situation in which all my problems seem to have transformed themselves into nails. And here I stand, without a decent hammer.
Im sure you can think of many other applications for a general-purpose function minimizer, but the ones Ive described
are plenty enough to whet my appetite. Thats why well be examining various methods to minimize functions over the next few months.
Unfortunately, Ive taken so long to lay out the motivation that I have precious little room to actually study minimization methods. So well content ourselves by setting up the problem and defining a default option minimizer thats guaranteed to work for almost any situation, albeit very, very slowly.
The test function
Obviously, if were going to develop a robust minimization algorithm, well be wanting to test it on quite a variety of functions. However, we need at least one good function to get started. We need one thats continuous and has a well-defined minimum; no fair testing on pathological functions with, say, an infinity of minima in a finite region. In fact, no fair testing with more than one minimum at all. Almost all minimizers, given a function
with two nearby minima, will converge to
one
of them, but which one is up for grabs. We dont want to deal with uncertainties like that, so our test function must have only one minimum in the region of interest.
A parabola is certainly a function with a well-defined minimum, but its too easy. At a high enough degree of magnification (that is, if we look closely enough only in the vicinity of the minimum), every continuous curve with a minimum looks like a parabola. Second-order methods
capitalize on this fact to fit a parabolic arc through neighboring data points. If we use a parabola as our test case, such methods will hit the exact solution on the very next trial. Thats hardly a challenge to the algorithm, so we cant use a parabola.
The cosine function has one minimum in the region 0..2p, but again, its too symmetric and not challenging enough. Heres a function that Ive settled on, that is tricky enough to provide some challenge:
A graph of this function is shown in
Figure 1
.
Because the function is analytic, we can calculate the location of its minimum. The cosine function has a minimum of 1 when its argument is p. We thus require that:
or
Numerically,
x
min
= 0.7937005259841
Whenever I start a new project like this, I like to ask myself: whats the dumbest, most simple-minded algorithm I can possibly think of, that can do what I want? In this case, the answer is: to find the minimum of the function, try a bunch of values of
x
, and retain the one that gives the smallest value of
y
=
f
(
x
).
The program of Listing 1 does just that. I begin by assuming that the solution is somewhere between
x
= 0 and
x
= 1. Im going to step
through that range, using
n
intervals. Whatever value of
x
gives the smallest
y
in that range, thats the one I save as the solution. The results, for some sample intervals, are shown in
Table 1
.
From
Listing 1
and
Table 1
, we can observe three facts:
- The algorithm is about as dumb as a post
- t gets the right answer
- Were going to need a
lot
of intervals to get much accuracy
From here on up, things can only get better. Next month, well try to discover just how much better we can do. See you then. esp
References
1. Roots of a Function,
Embedded Systems Programming
, August 1994, p. 13.
2. All About Runge-Kutta,
Embedded Systems Programming
, May 1993, p. 56.
3. Return to the
Valley of Runge-Kutta,
Embedded Systems Programming
, June 1993, p. 62.
4. The Least Squares Fit,
Embedded Sysytems Programming
, April 1993, p. 36.
Back
Jack W. Crenshaw is a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be
reached via e-mail at
jcrens@earthlink.net
.
Return to
Table of Contents