Observing the Unknown - Embedded.com

Observing the Unknown

Adaptive filters can be used to determine the system response of an unkown system. Black magic indeed.

What do you do when you have an unknown system to analyze, but you don't have access to its inner workings? Almost everyone, sooner or later, has to deduce or derive a system function by observation or guesswork. Working out a fixed system function is one thing, but recreating a system function that changes with time is a much more interesting and challenging proposition.

Closed loop systems-and this includes servo systems of all types from motion control to the noise reduction/compressor algorithms in audio-are often difficult to analyze and can be impossible to handle with a closed solution. This can make things especially difficult when you need to be able to predict what a system will do. This makes algorithms that expose certain hidden variables in the “black box” especially valuable.

In the textbooks, we read that the impulse response of a system may be determined by driving that system with an infinitely tall and infinitely narrow pulse. This is the Dirac function. The output of the system will be its impulse response. Such pulses are hard to come by, however. In many industrial and mechanical applications, engineers have resorted to driving a system with broadband white noise to discover which frequencies resonate and which die away in an effort to learn a given object's impulse response. This, too, has its drawbacks, especially in real-time applications.

Fortunately, other means are available. Currently, algorithms that allow us to determine, control, and change the system function (transfer function) of a process are used in motion control, adaptive prediction for speech reconstruction and synthesis, channel equalization for overcoming the deleterious effects of long lines in signals, noise cancellation for removing environmental noise, and even in satellite signal transmission. We're going to use such a filter as a model to estimate the characteristics of an unknown system. All in all, it is very interesting and useful stuff.

Last month, we took a brief look at two systems of this kind, the Luenberger observer and the Kalman filter. Most of us have heard of the Kalman filter but some may have shied away from using it because of its association with that black magic called statistics. The interesting thing is that, in the end, it is all the same stuff. Statistics-based processing uses the same laws and is capable of the same transformations as signal processing based upon more deterministic forms. This month, I will present another step in the development of observers and system modeling: the adaptive filter.


One of the primary uses for systems such as these is to remove noise. As you know, noise can be almost anything and what qualifies as noise is a matter of interpretation. Some noise, such as a 60Hz hum or a baby crying, has most of its energy concentrated at a certain point in the spectrum. White noise, or uncorrelated noise, however, has a more generalized effect. It is everywhere. The filter devices we will describe in this column can work in either circumstance.

Before we begin, I would like to draw a distinction between deterministic and random processes. Examples of physical phenomenon in which future measurements can be predicted with reasonable accuracy, based on physics and observation, are referred to as deterministic. This could include the force generated by an unbalanced rotating wheel, the position of a satellite in orbit about the earth, or the response of a structure to a step load. In such cases, it is not difficult to come up with closed forms with which to calculate or predict behavior.

A large number of physical phenomena, however, are not deterministic. For these phenomena, each experiment produces a unique time-history record which is not likely to be repeated and cannot be accurately predicted in detail. Such data and the physical phenomena they represent are called random.

There are two sorts of random processes: stationary and non-stationary. Think of a random process as an infinite collection of sample functions all occurring simultaneously. In these cases, experimental data is collected into time-history records (data sequences). The entire collection of time-history records is called an ensemble; this ensemble defines a random process describing a phenomenon. A process is stationary if its sample averages are independent of absolute time and non-stationary otherwise.

How do I know if something is stationary? If a process has a beginning or ending, it probably is not stationary. It may, however, be considered stationary (at least over most of its lifetime) if it lasts for a long time compared to the period of its lowest frequency spectral components. A stationary random process is much easier to work with and analyze because it is independent of actual time. However, it can be argued that no true stationary processes really exist, only those that for all practical matters are stationary.

As far as most engineering work is concerned, we will be dealing with processes assumed to be stationary.

Stationary data

With this ensemble of time-history records, the average properties of the data can be readily computed at any specific time by averaging.

White noise is a special kind of noise. It is a zero mean random signal. This means that its average or mean value is zero. Using continuous mathematics, we express this with an integral:

equation 1

The solution of this integral is a limit approached by the sum of the averages (or average of the sum). In this formula, the symbol E[x] means “the statistical expectation of,” or the mean. A special and useful feature of this type of signal is that it will not correlate.

Another important feature of a process is known as the standard deviation. In determining how closely an approximation of any sort might be to an actual signal, we can use the standard deviation as a figure of merit. The standard deviation, or variance, is given by:

equation 1

This equation reads as “the mean of the square of the deviation of the value and its expected value,” which is the average of how far the value strays from the ideal. It is a measure of how close to the actual your estimate is.

These definitions and formulae may be applied successfully to any signal, not just so-called random signals.


Correlation is a powerful tool in signal processing. It is especially useful for processing signals that are subject to complex, broadband noise that cannot be easily described with simple mathematical expressions. In this case, we view the signal as a stochastic sequence that we describe in terms of averages and use autocorrelation and cross-correlation to summarize.

The power, or energy, of zero mean white noise lacks focus. Whatever the total area of the signal, it is spread across the spectrum, whereas any signal components with definite frequency elements will concentrate their energies in peaks representing those frequency components. This can often allow the detection of even weak signals in noisy environments, such as mechanical systems.

The cross-correlation function is used extensively in pattern recognition and signal detection. We know that projecting one signal onto another is a means of measuring how much of the second signal is present in the first. This can be used to “detect'' the presence of known signals as components of more complicated signals. If we take a record of some signal we are looking for, add some zero-mean noise, and then project the original signal onto the synthesized signal, we will once again have the original signal:

equation 4

Autocorrelation is a special case of the generalized correlation function. Instead of correlation between two different variables, as in cross-correlation, the correlation is between two values of the same variable at times x[m] and x[m+n]. This value n is known as the lag. An autocorrelation will also produce a power spectrum for that data sequence. With this, you can determine the frequency(or frequencies) that contain the power in your signal.

When the autocorrelation is used to detect non-randomness, it is usually only the first (lag 1) autocorrelation that is of interest.

How do you make an observer?

The observer we will start with is very similar to the Luenberger observer we mentioned last month. It is a simple adaptive filter. An adaptive filter is basically a standard FIR or IIR filter, but we change the coefficients so that the output can be made to match a reference output. If we are successful at tracking the reference output, we will have access to the variables in the transfer function of the object we are tracking by inference. The coefficients of the filter form the impulse response of the system and their Fourier transform represents the system function.

An adaptive filter might be constructed from either an FIR or IIR filter. However, it is dangerous to update the poles of an IIR filter in real-time, because it is possible that they could move outside the unit circle. Therefore, we choose as our filter an asymmetrical FIR, though anti-symmetric and lattice structures are also options. The FIR is defined as:

equation 5


Here, h[k] represents the filter coefficients, x[n] are the input samples, and k is the filter length. Since we mean for this filter to adapt to a changing environment, we will be changing the coefficients h[k] to meet these conditions. Now, we need a reference, so we suppose an unknown system or black box with an output d[n], which we will call the desired signal. Thus, we can now generate an error signal by differencing the output of our filter and the desired signal:

equation 6

Using mean square error (MSE) as the criterion to be minimized in updating the filter coefficients, we say:

equation 7

By substituting Equation 1 and then solving:

equation 8

you can see two correlations. First, the middle term contains a cross-correlation between the input sequence and the desired output. Following this, the autocorrelation of the input sequence represents the sample-by-sample correlation of the input signal.

We derive the coefficients for this filter by differentiating:

equation 9

The result of this operation is a function that equates the convolution of a vector of optimum filter weights and the autocorrelation of the input signal with the cross-correlation of the input with the desired signal.

This, of course, requires the lengthy process of auto-correlating the input signal and cross-correlating the desired signal with the input signal to solve for the optimum filter coefficients, all of which leads us to find a simpler and faster method.

Hip to be least mean square

Another way to obtain the desired result without having to perform these calculations is known as the least mean square (LMS) algorithm. This is the most popular manner of producing an adaptive filter, and is the basis of the pseudo-code presented in Listing 1. This algorithm involves what is known as the steepest descent method, in which each succeeding approximation to the optimum filter weight vector is produced as a sum of the current weights and a proportion of the derivative of the mean square error, with respect to the current filter coefficients:

equation 10


equation 11

(The 2 in the middle of the equation is absorbed into the proportionality constant in the final equality.)

As you might guess, b controls the rate of convergence of the algorithm. The larger the value of b, the more rapid the convergence (but the greater the possibility of instability). And there is a way to determine a value for b that will be stable and yield the fastest possible convergence. b is given by:

equation 12

where N is the length of the filter and Px is the average power of the input signal. The filter we describe is illustrated in Figure 1.

In the pseudo-code in Listing 1,

samples is the number of data points to be processed, Xn[] is an array containing the data points, and Yn[] is the output array. The filter coefficients are stored in coefs[]. Dn[] is an array containing the desired output, error is the difference between the desired and actual output, and beta is variable controlling the rate of convergence of the filter. N is the length of the filter. As you can see the coefficients for the filter in this example are available and can be used to determine the impulse response and system function at any time.

Next month, I'll show you how this technique works on some data.

Don Morgan is a senior engineer at Ultra Stereo Labs and a consultant with 25 years experience in signal processing, embedded systems, hardware, and software. He wrote a book about numerical methods, featuring multi-rate signal processing and wavelets, called Numerical Methods for DSP Systems in C. He is also the author of Practical DSP Modeling, Techniques, and Programming in C and Numerical Methods for Embedded Systems. Don's e-mail address is .

Return to May 2001 ESP

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.