CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS


Programmer's Toolbox

Curmudgeon Repercussions

Faced with the unacceptable alternative of spending 42 hours calibrating each sensor, Crenshaw looks for a mathematical solution.

Boy, did I ever strike a responsive chord with the “I’m mad as hell” line! My e-mail cup runneth over. I found that an awful lot of other folks share the sentiment. Not one opposed it. We may be poised on the verge of the next revolution in personal computers, led by the People Who Aren’t Going to Take It Anymore. Some of the messages I received were extremely insightful and thought-provoking. One, in particular, inspired me to wonder just how many person-hours, and therefore dollars, are being wasted every day by people who set out to do a job, but ended up mired down in a battle with intransigent application software. As the old saying goes, “When you’re up to your hips in alligators, it’s hard to remember that you came to drain the swamp.” I find myself wondering just how many deserving swamps remain undrained.

Case in point: last night I had to generate a graph using Microsoft Excel. Now, as any spreadsheet user knows, you generally draw a simple graph by generating two columns of numbers, representing the x - and y -axes of the graph. You define the values in the x -axis column using a formula something like:

<A4> = A3 +
0.01

You define the functional relationship between x and y by a formula in cell B3, then duplicate it into cell B4. Then, to generate the column of numbers, you must copy cells A4:B4 downwards as far as you want the x -value to extend, say into cell A503, for the range:

x
 = 0..5

The way you copy cells, of course, is by first pressing Ctrl-C to copy them to the paste buffer. Then you highlight the region to copy to, by dragging the mouse across it, and finally, you press Ctrl-V to paste.

There’s only one small problem: unless your CRT has a lot more resolution than mine, you can’t highlight 500 rows of a spreadsheet all at once—they don’t fit on the screen. So instead, I must drag the mouse over the 500 lines, letting Excel scroll the screen at whatever rate I can live with (careful not to overshoot), until the row count gets to 503.

Until recently, I used Borland’s Quattro Pro rather than Excel (I switched for compatibility reasons). Quattro Pro required a similar approach, but with one very important difference: it had a command called something like “Select Block,” in which you could drag the mouse over a region, and the range would appear in a box. The neat part was, having selected the range with the mouse, you could then go in and edit the last field to extend it.

For example, I could highlight the range A4:B10. Then I could manually change the 10 to a 503, press a key, and bingo! There were my columns, neatly fleshed out.

Last night, as I was more or less patiently scrolling through 500 lines in Excel, overshooting, scrolling back, and so on and so forth (why does it always seem to scroll faster backwards, after an overshoot?), I found myself thinking, I wonder how many people, all over the world, at that very instant, were also scrolling screens in Excel, to do the very same thing. How many years has Excel been working this way, and nobody ever thought to add the Quattro Pro “Select Block” feature?

In future issues, I’ll be sharing some of the more insightful e-mail on this topic with you. For now, I just want to wait, see what others have to say, and let my thoughts gel. I do perceive, though, that the feelings run a lot deeper than I’d thought.

Last month I set out to tell you about the idea of using distribution functions to simulate input pulses. The central idea was one that’s pretty obvious, in retrospect, and one that I’ve put to use in other areas many times. That idea is: just because a certain function first appeared in studies of probability distributions doesn’t mean that this is the only use to which one can “legally” apply the function. The functions describing the conic sections (which include the circle, ellipse, parabola, and hyperbola) originally came from geometry. They’re called the conic sections because they are the curves generated by the intersection of a plane and a cone. The most intensive use of the properties of these curves, however, turned out to be in quite a different area than simple plane geometry: we used them to go to the Moon.

Purely by chance, the same equations that express the mathematical properties of the conic sections just happen to also represent the solution to the two-body problem of celestial mechanics, and therefore describe the orbits of all celestial bodies in which the influence of one attracting mass dominates all others. Although this is never really the case in reality, it’s a good enough approximation for almost every real body in our solar system, and most others as well; planets, comets, and moons all follow orbits that are almost perfect conic sections.

The key message, then, is simply that one needn’t feel constrained to use functions only for the problems they were originally intended to solve. We are free to use whatever function happens to work. This idea seems so obvious that you’re probably wondering why I even brought it up. The reason is simple enough: although I fully understood and have applied the idea many times before, the specific notion of using probability functions for something other than describing statistics is one that simply hadn’t crossed my mind before.

Still, I don’t feel moved to belabor the point. Having finally, after all these years, had the light bulb labelled “idea” go off above my head, I thought the notion worth sharing. Now I’ve done that. There’s no point going on as though it’s the last idea I’ll ever have. File the idea away somewhere safe, and dredge it out the next time you’re trying to fit data. Meanwhile, we’ll move on to other things.

New beginnings

What shall we move on to? Well, I have some rather specific things in mind, for both the column and an upcoming set of articles. These things are fairly extensive, and will occupy our time for the next few months.

Over the years, I’ve given you what I think are some very nice, general-purpose algorithms and software modules for solving various practical problems, such as an algorithm for finding the root of a function, 1 a general-purpose Runge-Kutta integrator, 2,3 an algorithm for least-square fits, 4 and so on. It’s been awhile since I did one of these algorithm orgies, and I find myself building up to another one. To spare you the suspense, for my next trick, I intend to develop a world-class, general-purpose software tool for finding the minimum of a function.

This specific goal, however, is not even half the story. To explain why I’m interested in this particular problem-solver, I need to give you the background behind my interest, and the motivation for it. I can’t think of any way to give you this in a few sentences, so please bear with me as I give you the motivation in its long form. Recognize that, just as with the probability distribution functions, there’s no law that says that we must always use the final product for the same reasons I’m interested in it. You are free to use it wherever it fits the job. However, I think when you see the background details, however convoluted the reasoning, you’ll see why this particular tool becomes central to the solution to many, many problems.

The CO sensor

I’m sure you’ve noticed that lately I seem to be concentrating on the notion of fitting equations through tabulated data. That obsession is no accident; it’s what has been occupying a considerable percentage of my time for the last three years. I can’t tell you the details of that work—it was proprietary—but I can give you a practical example that illustrates both the problem and its solutions quite nicely.

Some years ago, I bought into a project that looked like a sure winner. A friend had identified an important need: many people die every year because they use open kerosene heaters or gas heaters in their mobile homes and RVs. The heaters release carbon monoxide (CO) into the closed space, and the people die without ever being aware they’re in trouble. My friend saw a need for a warning sensor that would detect CO and sound an audible warning, in a fashion similar to a smoke detector. What’s more, he managed to locate a chemical cell that would generate an electrical voltage when CO was present.

We bought one of the sensors and proceeded to test it to determine its response to various levels of CO. We learned, not surprisingly, that the response was anything but linear. Nevertheless, we ended up with a calibration curve for the sensor, and reasoned that as long as we knew what the response curve was, we could compensate for the nonlinearities.

Then we learned that the sensor wasn’t sensitive to just CO. It also depended on the temperature and humidity of the air, and the temperature of the sensor itself. No problem, we thought; we can add thermistors and a hygrometer to measure temperature and humidity (though the latter is not easy to find, and hygrometers themselves depend on temperature). Still, the sensor was so small and cheap, it seemed a good idea to make it work, despite its many dependencies.

Our next surprise was when we got a second sensor, and discovered that it didn’t behave like the first one. Each sensor had a somewhat different set of characteristic curves.

Calibration

Faced with such a situation, what do you do? One approach is simply to go to another source, for a sensor that’s more linear and has better quality control. But if the sensor at hand is cheap and reliable enough, it’s still tempting to try to figure out a way to use it.

Recently, I worked with another sensor of a different sort, that had a similar story. It was small, cheap, low-power, extremely reliable, and dead-nuts accurate once it was calibrated—all the characteristics one could ask for in a sensor. It also had a highly nonlinear response, was sensitive to other factors like temperature, and the response varied wildly from unit to unit. We desperately wanted to use the sensor, but we had to make it work in volume production. The only solution was to calibrate each sensor, storing its particular response curve. The solution we chose was to have the vendor store the calibration constants in an EPROM that came mounted on the same board as the sensor itself. The software could then read the EPROM, set up its compensation algorithms, and linearize the response.

The only problem with this approach is a practical one: it takes time and resources to calibrate a sensor. The more measurements you have to take, the longer the calibration is going to take. If you’re not careful, the low cost of the sensor itself can easily be eaten up by the cost of the labor required to calibrate it, not to mention the cost of the software to compensate it.

Given a certain calibration curve, we can always fit it with a polynomial of high enough order. Alternatively, we could use a lookup table with linear interpolation, or a spline curve fit through fewer points. The problem with such an approach is that it takes quite a number of data points to get a good approximation to the curve. Now, how do you go about getting the data points?

In the case of that old CO sensor, we can conceive of some kind of test rig that flows air, along with measured concentrations of CO, through the sensor, and reads the output voltage. But if I need, say, a tenth-order polynomial to fit its response curve, I’m going to have to compute 11 coefficients. Eleven coefficients means at least 11 values of the CO concentration. Characterizing the sensor with a lookup table would require many more than 11 concentrations.

But that’s not all. Remember that I said the sensor was also sensitive to temperature and humidity. That means that every run I make with different concentrations must now be repeated, both for different temperatures and different humidities. Suddenly, I’m beginning to envision a test rig that must be automated, capable of supplying many different concentrations of gas, at controlled values of temperature and humidity. Just providing the latter—controlled humidity—is a trick all by itself. The cost of the test rig to let me use my ultra-cheap sensor is suddenly beginning to approach that of a launch facility for the Space Shuttle!

Even that’s not the worst of it. The worst of it has to do with the dimensional explosion.

Suppose I can get by with, say, 10 values of CO for each calibration curve. But if the sensor is also dependent on temperature, I need to measure it at 10 sensor temperatures. Also 10 air temperatures, and 10 values of humidity. Now I need not 40 data points, but 10,000! Since this is a gas-driven, chemically-activated sensor, we’re going to have to give it some time to get a steady flow, and stabilize its output. Figure in at least 15 seconds for that, and we’ve got 150,000 seconds, or about 42 hours of calibration, per sensor . Now we have at least two new problems. First, assuming that a human has to be around to monitor the calibration process, we’re talking on the order of $600 to $1,000 per sensor, just in labor costs. Second, unless we’re willing to build multiple calibration systems, we can only produce, at most, 17 sensors per month, even if we’re running the calibrator over three shifts. Add to this the cost of keeping a plant open and running for those three shifts, and suddenly our cheap sensor doesn’t seem cheap at all.

How can we get around these problems? The answer is simple enough to conceive, not so easy to realize. We have to find a way to fit the characteristic curves with fewer coefficients, which means fewer data points to measure. Imagine, for example, that we are so astonishingly lucky that the calibration curve turns out to be a pure exponential:

Now, instead of 10 or 11 coefficients, we need only two. What’s more, we might even discover that one of the coefficients, say k , is the same for all sensors. Suddenly, our intractable problem becomes tractable again, and we can announce that the plant will no longer be running three shifts.

Fitting a curve like that in Equation 2 to data, however, is a lot different than simply recording readings for a lookup table. In general, we can expect to have to use some trial and error, especially in the research phase, to discover that Equation 2 does indeed describe the sensor output. Though we might be able to, as we hope, use only two points to determine the two coefficients during calibration, we will almost surely need many more to verify that Equation 2 really does fit all the sensors to enough accuracy to be useful.

This is where the curve fitting comes in. We can begin with a characteristic curve, taken by whatever painful and time-consuming process we may need, during the R&D effort. What we seek to do is to find a function that fits this data well, and also has as few coefficients as possible. The obvious criterion for “good fit” is the least-squares criterion. This criterion requires that we choose a function such that the sum of the squares of the error at each data point is minimized.

Now, if the curve to be fitted is a polynomial, standard linear regression techniques will find the coefficients. But remember, polynomials take too many coefficients. We’re not seeking a polynomial solution, because we can’t afford the large number of coefficients it’s likely to require. We’re seeking a solution using a nonlinear function like Equation 2, and that requires a nonlinear least squares fit. And that means minimizing a nonlinear function. In fact, it means minimizing a function of several variables—the coefficients.

Regular readers will recall my mention of a nice software package called MicroMath Scientist. This is precisely the kind of problem Scientist was developed to solve. Given any nonlinear relationship, Scientist finds the values of the unknown coefficients that minimize the least-square error. How does it do it? By the same methods that we’re going to be studying here. I don’t know what else MicroMath Scientist does, and I can’t tell you which algorithms it uses, but I can be 100% certain that at its heart beats a general-purpose function minimizer.

Incidentally, a reader told me that he’s been unable to contact MicroMath. I sure hope the company hasn’t closed up shop. Our industry has already lost too many Good Guys; we don’t need to lose any more. If anyone out there knows if MicroMath is still in business and how to contact them, please let me know via e-mail, and I’ll relay the info to the rest. Also, if anyone knows of other sources for nonlinear equation solvers, I’ll be happy to add them to our list of resources.

Minimizing in real time

Is there ever an occasion when we need to run a function minimizer in a real-time embedded system? You bet there is, and I’ve also wrestled mightily with this problem over the last year or two. The problem rears its ugly head when we go beyond single-input, single-output systems.

To keep the discussion strictly generic, imagine that we’ve got a system that can take five different inputs, and return five different outputs. There is cross-talk, however, between channels. Each output channel, in general, responds to all five inputs.

Let the inputs to the system be denoted by the vector x , and the outputs, y . If the system were linear, we could write the dependencies of y upon x in matrix notation:

y
 = 
Ax

Inverting the matrix, then, gives:

y
 = 
A
-1
y

If the response is truly linear, the matrix A is a constant (possibly different for each sensor). That means its inverse is also a constant, so we can compute the inverse off line, as part of the calibration, and store the inverse in EPROM. Then, given a set of output voltages, y , we can compute the combination of inputs, x , that produce them.

Unfortunately, we’ve already agreed that the characteristic curves for our sensor aren’t likely to be linear, so we can’t write down Equation 3, much less Equation 4. The best we can do is write:

y
 = 
f
(
x
)

But the relationship implied by f ( x ) is sure to involve some constants, such as A and k in Equation 2. To make that involvement explicit, we’ll write:

y
 = 
f
(
x
,
a
)

where a is a vector of some undefined number of coefficients. To make use of our sensor, we must face two challenges. First, during calibration, we must determine both the nature of f ( x , a ), and the numeric values of a that best fit the data. That’s a job that calls for a nonlinear curve fit.

Second, to support the real-time function, we must find a way to revert Equation 6 to determine x , given y . If we’re clever enough, and if the math is tractable, and if we’re very, very fortunate, we’ll be able to revert the equation in closed form. In that case, we only need to program up the reverted equation, and compute it in real time.

Murphy’s law being what it is, however, we expect the likelihood of getting a closed-form reversion to be slim to none. We can be virtually certain that, whatever the form of Equation 6, we’re going to have to revert it in real time.

How do we do that? We can start by assuming a solution, call it . We then calculate the corresponding value of

We now compute the error vector:

Our goal, then, is to drive the value of e to zero by somehow adjusting our guess, . A Newton-Raphson ap-proach sounds like a reasonable one.

In this example, I’ve assumed that we have an equal number of inputs and outputs; that is, the vectors x and y have the same dimensions. In that case, we can expect that, unless the function f ( x , a ) is really crazy, we can always find a value of which forces e to zero. However, in the real problem which I worked, we actually had more outputs than inputs; the sensor was a multi-channel sensor, sort of like the spectrum analyzer of an audio system. In this case, we wouldn’t be able to invert the matrix A , even if the problem were linear. Which it is not.

For such a situation, we are unlikely to be able to match all the output voltages, y . Our model, which is at best an approximation to the real world, is bound to be somewhat in error, which means that the value of generated by Equation 7 is never going to match, exactly and element-by-element, the measured value of y . The best we can hope for is to minimize the error. This means that our criterion for best fit becomes another least-square criterion:

This one we must solve in real time, so clearly we’re going to want a very efficient minimizer.

Minimax

In an upcoming article, I’m going to be talking about ways to obtain the best fit for functions of a general nature. As it turns out, the least-squares criterion is not always best. Those of you familiar with the approximation of trig functions know that we can improve the accuracy of power-series approximations using Chebyshev polynomials. I’m long overdue for an article describing how to use these functions. I won’t attempt to cover them here, but suffice it to say that the Chebyshev polynomials derive from a different criterion than least squares. Instead of trying to minimize some bulk error like mean-square error, over the entire range of a function, we require that the error be bounded by the smallest possible bound.

You can see why this is a better criterion, for computer-generated functional approximations, than the least-square criterion. With the latter criterion, we can conceive of some pathological case in which the error is essentially zero in all but a few isolated points, where it is intolerable. The Chebyshev functions are based on the notion that the value of the largest error is minimized. This is the so-called minimax criterion. In practice, since the error for a polynomial fit is zero at some points (namely, the points used to fit the data), we expect to see it oscillate, positive and negative, between these points. If any peak, in either direction, is dominant, we can imagine reducing the maximum error by adjusting that peak downward, at the expense of raising other peaks. The curve fit is optimized, and the minimax condition is met, when all the error peaks have equal amplitudes.

If we’re dealing with polynomials, we have no need of iterative solutions; we can guarantee a minimax solution by proper application of the Chebyshev polynomials. However, polynomials are not the only functions useful in curve fitting. One extremely useful function that is not a polynomial is the rational polynomial form:

where P ( x ) and Q ( x ) are themselves polynomials. For this important case, the Chebyshev polynomials can’t help us; we must find the minimax solution the hard way, by trial and error.

Are we motivated yet?

Well, it took a long time to describe, but now you can see the motivations behind my interest in general-purpose minimizers. To make our multivariate sensor work, we needed efficient minimization algorithms, for both the calibration and real-time operation of the system. To study various functions and their suitability to fit the observed data, I also needed a nonlinear least-square tool to support my analysis. To get the optimal minimax solution for functions other than power series, I also need a minimizer. It almost seems that every problem I’ve worked on for the last three years boils down to the same problem, at its deepest level.

An old truism, and one of my favorites, goes: “To a man whose only tool is a hammer, every problem looks like a nail.” In my case, I find myself in the peculiar situation in which all my problems seem to have transformed themselves into nails. And here I stand, without a decent hammer.

I’m sure you can think of many other applications for a general-purpose function minimizer, but the ones I’ve described are plenty enough to whet my appetite. That’s why we’ll be examining various methods to minimize functions over the next few months.

Unfortunately, I’ve taken so long to lay out the motivation that I have precious little room to actually study minimization methods. So we’ll content ourselves by setting up the problem and defining a “default option” minimizer that’s guaranteed to work for almost any situation, albeit very, very slowly.

The test function

Obviously, if we’re going to develop a robust minimization algorithm, we’ll be wanting to test it on quite a variety of functions. However, we need at least one good function to get started. We need one that’s continuous and has a well-defined minimum; no fair testing on pathological functions with, say, an infinity of minima in a finite region. In fact, no fair testing with more than one minimum at all. Almost all minimizers, given a function with two nearby minima, will converge to one of them, but which one is up for grabs. We don’t want to deal with uncertainties like that, so our test function must have only one minimum in the region of interest.

A parabola is certainly a function with a well-defined minimum, but it’s too easy. At a high enough degree of magnification (that is, if we look closely enough only in the vicinity of the minimum), every continuous curve with a minimum looks like a parabola. Second-order methods capitalize on this fact to fit a parabolic arc through neighboring data points. If we use a parabola as our test case, such methods will hit the exact solution on the very next trial. That’s hardly a challenge to the algorithm, so we can’t use a parabola.

The cosine function has one minimum in the region 0..2p, but again, it’s too symmetric and not challenging enough. Here’s a function that I’ve settled on, that is tricky enough to provide some challenge:

A graph of this function is shown in Figure 1 .

Because the function is analytic, we can calculate the location of its minimum. The cosine function has a minimum of –1 when its argument is p. We thus require that:

or

Numerically,

x
min 
= 0.7937005259841

Whenever I start a new project like this, I like to ask myself: what’s the dumbest, most simple-minded algorithm I can possibly think of, that can do what I want? In this case, the answer is: to find the minimum of the function, try a bunch of values of x , and retain the one that gives the smallest value of y = f ( x ).

The program of Listing 1 does just that. I begin by assuming that the solution is somewhere between x = 0 and x = 1. I’m going to step through that range, using n intervals. Whatever value of x gives the smallest y in that range, that’s the one I save as the solution. The results, for some sample intervals, are shown in Table 1 .

From Listing 1 and Table 1 , we can observe three facts:

  • The algorithm is about as dumb as a post

  • t gets the right answer

  • We’re going to need a lot of intervals to get much accuracy

From here on up, things can only get better. Next month, we’ll try to discover just how much better we can do. See you then. esp

References

1. “Roots of a Function,” Embedded Systems Programming , August 1994, p. 13.

2. “All About Runge-Kutta,” Embedded Systems Programming , May 1993, p. 56.

3. “Return to the Valley of Runge-Kutta,” Embedded Systems Programming , June 1993, p. 62.

4. “The Least Squares Fit,” Embedded Sysytems Programming , April 1993, p. 36.

Back

Jack W. Crenshaw is a senior principal design engineer at Alliant Tech Systems Inc. in Clearwater, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via e-mail at jcrens@earthlink.net .

Return to Table of Contents

Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :