Jack Crenshaw - January 30, 2012

The universe is a noisy place. More so in Manhattan, Detroit, or the galactic core; less so in the Swiss Alps or interstellar space, but noisy everywhere. No matter how hard you try, you can't escape the noise.

Few people understand this better than scientists and engineers, especially those of us who work with embedded real-time systems. Chances are, the first time you hooked up some measuring sensor to an analog-to-digital converter, you learned this lesson well.

Most likely, you decided to insert a low-pass filter into the signal path. That will remove a lot of the noise, all right, but also the fine structure of the system behavior. And it necessarily adds a time delay to the system--a delay that will affect the stability of a control system.

Filters have their place, for sure. But in the end, the use of a low-pass filter is a tacit admission that we don't have a clue what the system is really doing. We filter it because it's the only recourse we have. We treat the input signal as pseudo-static, and hope that our sample rate is fast enough to make the signal look like a constant.

On the other hand, there can be times when we do know--or think we know--how the system should behave. A real physical system obeys the laws of physics. A reactor in a chemical plant is going to follow the laws governing its particular chemical reaction. An airplane in flight is going to obey Newton's laws of motion.

In such cases, we can do a little better than simply relying on brute force low-pass filters. If we know something about the dynamics of the system, we can make a better estimate of what it's doing now and what it's going to do next. That concept has been the focus of my last few columns and will remain our focus for the near future. This is the third installment of the series.

Wouldn't you like to fly?

Now, there's one place in the universe that's blissfully noise-free: the pristine world of mathematics. Add 2 and 3, and you always get 5. Exactly. For a given value of

The real world is not so accommodating. A real airplane obeys the same laws of motion, but it's also subject to disturbances that aren't modeled in my simulation: wind gusts and updrafts; headwinds; changes in air temperature and density; pilot inputs. Even people moving around in the cabin.

Perhaps more importantly, every sensor measuring a flight-related parameter has errors of its own: electronic noise, scale factor errors, quantization errors, drift, temperature sensitivities, etc. All of these things introduce uncertainties into my perfect world of simulation.

But wait; there's more. Even if I could simulate all those effects, the simulated airplane is still not going to behave like the real thing, because there are system parameters whose values I don't know exactly. I may know the mass and inertia properties of the dry airplane, but not with an unknown number of passengers of unknown weights, or the mass and distribution of fuel in the tanks. I may have precise data sheets that give me the theoretical thrust of the engines, but not the actual thrust they're producing at any given time.

So there are three ways my simulation can never be perfect: I don't know the precise values of the system parameters, I don't have precise measurements of the system state, and my math model doesn't include all the possible effects. In the real world, our challenge is to make our best guess as to all these things, based on the noisy measurement data we have available.

Because the system is subject to unmodeled effects, it's not enough to just measure the state once or twice, and assume the system will obey its laws of motion from there on. I have to continue to take measurements, and constantly improve my best guess as to both the system state and the parameters that I thought I knew, but got wrong.

The

The

The variance, then, is the sum of squares of the residuals:

The

To illustrate the method, I generated an example that produced a graph like the one in

To get the result of Figure 1, we processed all the elements of

To support sequential processing, we defined two running sums. For a given value of

Each time a new measurement,

Then the updated mean and variance are given by:

And:

Using this algorithm, we got a figure like

A picture is worth…

As utterly simple as this first example is, Figures 1 and 2 tell us some important things. The first is that I chose to draw

The values of

Together, the two vectors define a set of ordered pairs:

Now look at Figure 2. In this case, the order of the elements of

I'm sure you can see that this kind of processing is precisely what we need in a real-time system. At any point in time, the resulting values of and

This is exactly the kind of behavior we'd like from a Kalman filter. In fact, the algorithm that generated Figure 2 is the Kalman filter for this simple problem.

By the way, look closely at the final values of the statistical parameters in the two figures. You'll see that they're the same, as they should be.

When we plot data that has scatter in it, such as red curves in the figures, we don't imagine for a moment that the scatter is truly part of

We place no restrictions on the nature of

In which case, we'd be performing

Did the method work? Absolutely. Did I get the mean and standard deviation? Yep. Are the results meaningful? Hardly. That's because, when we average a set of numbers, we're making the implicit assumption that

The problem, of course, is that my assumption concerning

Now, throughout this series of columns, I've been saying that we seek to discover the nature of

In other words, we need a

There's another hugely important message implied by Figure 3: You need to draw that figure. We could have blithely applied the method of averaging, and we'd have gotten a solution. Without that figure, we might very well have accepted the results, and moved along, blissfully unaware that our assumption was an awful one, and the solution was dross. The old adage, "A picture is worth a thousand words," was never more true than in the case of curve fitting. You have to see a graph of the raw data before you can make an educated guess as to the nature of

But I just said that the method can't specify the model; We have to give it one. So if I have to specify the model, what's left for the method to optimize? The answer, of course, is the unknown coefficients in Equation 13. To clarify this point, we should probably make the dependence explicit by rewriting the equation in the form:

In my last column, I derived the equations for this case. We found that we could put the solution into a nice matrix form:

Where:

And:

Applying this method to the data of Figure 3 gives the linear curve fit in

Is that beautiful, or what? Notice how much smaller the error band is, compared with Figure 3. I said earlier that the method of least squares can't choose the model for you, but it can certainly help. By trying both the constant model and the linear model, we can clearly see that the linear one has a better fit, since the error band is smaller.

You have to be careful when you do this, however. I can make the standard deviation

And:

Note carefully that these patterns

Is still linear in the coefficients

Unfortunately, we still have to assemble

As the cherry on top of the sundae, I've generated

As a matter of interest, for this example I used the function:

The least squares algorithm guessed 4.667, 0.359, and -0.009 for the coefficients. Perhaps they're not as exact as we might like, but still not bad, considering the amplitude of the noise and the small data set.

Why are these estimated values not equal to the original ones? Simply because they're

At this point I think we've pretty much exhausted the topic of least squares fits. Now it's time to move on to more challenging topics in estimation theory. When you think about it, it's really quite remarkable that we've been able to get as far as we have, without discussing the topics of probability theory and probability distributions at all. We even managed to define the standard deviation of noise, without mentioning its relationship to the normal distribution.

That's going to change. While it's possible to delve even further into the theory without mentioning probability distributions, to do so gets more and more awkward as we go. In my next installment, we're going to set aside curve-fit algorithms for a time, and focus on probability theory.

You might want to tighten your seat belts; it could be a bumpy ride.