Can you give me an estimate?
About 18 months ago, I wrote a column on the least squares fit (“Why all the math,” www.embedded.com/4027019) where I emphasized the technique to fit a curve to noisy data. It's not the only way to do that, of course. If you've ever worked with any embedded system that had a sensor measuring a real-world parameter, you know all about dealing with noise. The usual way is to pass the signal through a low-pass filter, which can be anything from a simple first-order lag to a finite impulse response (FIR) or infinite-impulse response (IIR) filter with scores, if not hundreds, of terms.
So what's the difference between a least squares fit and a low-pass filter? They both extract a signal from the noise, right?
Jack Crenshaw's Estimation Series
Part 1: Why all the math?
Part 2: Can you give me an estimate?
Part 3: Estimation interruptus
Part 4: The normal distribution
By contrast, the term least squares invokes an image of batch processing, where you operate on a whole set of data items after they've been collected. The notion of time is not explicitly involved. Indeed, when you're applying the least squares fit, you can shuffle the data indiscriminately, and the technique will still give you the same solution.
As far as we know, the first application of the least squares fit was invented by Carl Friedrich Gauss in 1795, at the ripe old age of 18. He used it in 1801 to predict the motion of the minor planet, Ceres. I think it's safe to say that Gauss's analysis was anything but real time. Unless, of course, you accept a clock rate measured in days or weeks.
The distinction between a filter and a least squares fit gets a lot more fuzzy if we set up the least squares fit so that we can process the data sequentially, as it comes in. That concept, sequential processing, is going to be our main focus in this and future columns.
But there's another distinction between filtering and fitting that's much more fundamental and profound than the way you process the data. That distinction lies in what you know—or think you know—about the process that generates the data.
When I'm developing a system to filter noise out of a signal, I don't have any idea what's happening in the real world, to create that signal. At the level of the analog-to-digital (A/D) conversion, it's just a voltage that comes from somewhere, corrupted by noise. My job is only to extract the signal from the noise.
The least squares fit is different. We apply it when we think we know something about the process that generated the data. If I'm applying a linear regression to a data set, it's because I think that one element of the set's data pair depends on the other. And not just depends, but has a linear relationship, which I can graph as a straight line. The purpose of the least squares fit is not just to filter the noise, but to discover the coefficients of that straight line relationship.
Of course, the regression doesn't have to be linear. Using least squares, I can fit a quadratic, or a polynomial of any order. I can fit a logarithmic relationship, or a sum of logarithmic terms. I can even fit a sum of sine waves (in which case I've done a Fourier analysis). It really doesn't matter what we think the relationship is; it only matters that we think that there is one.
The point is, when I'm applying a least squares fit I'm doing more than just filtering noise. I have a mental model of what's going on in the system that's generating the data. Presumably, that model includes coefficients whose values are unknown. The job of the least squares fit is to give me a best estimate of those coefficients. For obvious reasons, this process is often called state estimation, and it's this discipline that will occupy our interest in the rest of this column, and several more. If things work out right, the series will culminate in that dream of all estimators, the justly famous Kalman Filter. I'll leave it to you to ponder why the pinnacle technique of state estimation is called, not an estimator, but a filter.