Can you give me an estimate?
Some do it sequentiallyThe problem with taking an average using Equation 6 (or 5) is that the computation is basically a batch process. To add up all those numbers, we have to have them all available. In an embedded computer, this means that we must keep the potentially huge set of terms stored in memory.
Is there a better way? Of course. To see how, let's look at how the average value changes as new data comes in. Let mn be the mean of the first n values of the set. As the data comes in, we'll have:

But each of the sums in these equations are merely the sum in the previous mean, plus one new term. We can write:

And, in general:

As you can see, we don't need to keep any of the old elements of y around. In fact, we don't need to keep any of the old elements of m around, either. We only need the latest value of m, plus the newest element of y. The new value of m can overwrite the old value. In a software implementation, we only need to have two persistent, scalar variables: the past value of m, plus the integer counter, k.
Better yet, let's not keep the old value of m, but the running sum. This lets us avoid the multiplication by n. Let:

Then:

Writing the software is almost easier than describing what it must do. Listing 1 shows a snippet of code that works in either C or C++. It's not perfect, and it's not production quality. Because it has static variables, it won't work if you're asking it to find more than one average per program. If there were ever a case for a C++ class, this is it. That task, I'm leaving “as an exercise for the student.” But the code I've shown should give you the idea.

Click on image to enlarge.
We need a variance
That task—of computing an average—wasn't too hard, was it? I hope it didn't tax your brains too much. Here's your next challenge: For the same data set, compute the variance and standard deviation.
Whoa! You want me to . . . what? Did we cover that in class?
Not yet, but we'll do it now. In general, the data in our sample data set y tells us more than just its mean, or average, value. It also tells us something about the reliability of each data element. In other words, it gives us insight into how much noise hides inside the data. It gives us the statistics.
In Figure 1, I've plotted the data set and also the mean value. As you can see, the data items themselves bounce around quite a bit. As often happens, none of them are actually equal to the mean value. For each value of n, the value in the set differs from the mean by an amount called the residual.


It would be nice if we had some single scalar measure—we might even call it a variance—of the quality of the data. Clearly, this measure would have to involve all the members of the data set. We could try adding all the residuals together, or computing their average value, but that wouldn't work. Because the residuals can be positive or negative, they could cancel each other, leaving us with a false impression. If, for example, the data values alternated around the mean, then the sum of any pair—or all the pairs—would be zero, and so could be the average residual. That would tell us nothing about the real quality of the data set.
A measure that does work, though, is to take the sum of the squares of the residuals. This, of course, is the same measure that's used in linear regression and similar curve fits. Can we say “least squares”? Duh. More precisely, let's define variance as the mean of the squared residuals:

Once we have the variance, the standard deviation is easy. It's just the square root of the variance:

This quantity, standard deviation, is supposed to be a measure of the amount of noise in our signal. Take a look at Figure 2. This is the same graph as Figure 1, but I've added the two horizontal, dashed lines a distance σ above and below the average value. You get the impression that new measurements of y are usually going to lie in the band between these two limits. Are the limits absolute? Certainly not. As you can see, five of the 10 points lie outside the band. Even so, these lines do seem to say something about the “scatter” in the data, don't they? From the defining equations, it's clear that the size of σ is going to depend on this scatter. If all the values yi are nearly equal, then σ will be small, and the band will be more narrow. In the limit, when there is no noise, or scatter, at all, all the measurements will be equal, the value of σ will go to zero, and so the band will have zero width.

You'll note that I haven't said anything, so far, about probabilities, or distribution functions, or any of those terms that relate to statistics. And I won't, in this column. We'll have plenty of time for that, later. For now, you only need to get the concept that the standard deviation is a measure of the amount of scatter, or noise, in the data.


Loading comments... Write a comment