From Here to Infinity -

From Here to Infinity

H filters can be used to estimate system states that cannot be observed directly. In this, they are like Kalman filters. However, H filters are more robust in the face of unpredictable noise sources.

Originally designed in the 1960s for aerospace applications, the Kalman filter (“Kalman Filtering,” June 2001) is an effective tool for estimating the states of a system. The widespread success of the Kalman filter in aerospace applications led to attempts to apply it to more common industrial applications. However, these attempts quickly made it clear that a serious mismatch existed between the underlying assumptions of Kalman filters and industrial state estimation problems.

Accurate system models are not as readily available for industrial problems. In addition, engineers rarely understand the statistical nature of the noise processes that impinge on such systems. After a decade or so of reappraising the nature and role of Kalman filters, engineers realized they needed a new filter that could handle system modeling errors and noise uncertainty. State estimators that can tolerate such uncertainty are called robust. The H filter was designed for robustness.

What's wrong with Kalman filters?

Recall that the Kalman filter estimates the states x of a linear dynamic system defined by the equations:[1],[2]

where A, B, and C are known matrices; k is the time index; x is the state of the system (unavailable for measurement); u is the known input to the system; y is the measured output; and w and z are noise. The two equations represent what is called a discrete time system, because the time variable k is defined only at discrete values (0, 1, 2, …). Suppose we want to estimate the state x of the above system. We cannot measure the state directly; we can only measure y directly. In this case we can use a Kalman filter to estimate the state. If we refer to our estimate as , the Kalman filter equations are given as follows:

where Sw and Sz are the covariance matrices of w and z, K is the Kalman gain, and P is the variance of the estimation error. The Kalman filter works well, but only under certain conditions.

First, the noise processes need to be zero mean. The average value of the process noise, Wk , must be zero, and the average value of the measurement noise, zk , must also be zero. This zero mean property must hold not only across the entire time history of the process, but at each time instant, as well. That is, the expected value of w and z at each time instant must be equal to zero.

Second, we need to know the standard deviation of the noise processes. The Kalman filter uses the Sw and Sz matrices as design parameters (these are the covariance matrices of the noise processes). That means that if we do not know Sw and Sz , we cannot design an appropriate Kalman filter.

The attractiveness of the Kalman filter is that it results in the smallest possible standard deviation of the estimation error. In other words, the Kalman filter is the minimum variance estimator.

So what do we do if the Kalman filter assumptions are not satisfied? What should we do if the noise processes are not zero mean, or we don't have any information about the noise statistics? And what if we want to minimize the worst case estimation error rather than the variance of the estimation error?

Perhaps we could use the Kalman filter anyway, even though its assumptions are not satisfied, and hope for the best. Maybe if we ignore the problems we encounter they will go away. This is a common solution to our Kalman filter quandary and it works reasonably well in many cases. However, another option is available: the H filter, also called the minimax filter. The H filter does not make any assumptions about the noise, and it minimizes the worst case estimation error. Before we can attack the H filter problem, we need to cover a couple of simple mathematical preliminaries.


If we have a square matrix D, then the eigenvalues of D are defined as all values of l that satisfy the equation Dg=lg for some vector g. It turns out that if D is an n3n matrix, then there are always n values of l that satisfy this equation; that is, D has n eigenvalues. For instance, consider the following D matrix:

The eigenvalues of this matrix are 1 and 3. In other words, the two solutions to the Dg=lg equation are as follows:[3]

Eigenvalues are difficult to compute in general. But if you have a 23 2 matrix of the following form:

then it becomes much easier. The two eigenvalues of this matrix are:

In the example that we will consider in this article, we will need to compute the eigenvalues of a matrix that has the form of the D matrix above.

That's all there is to eigenvalues. It's a high-tech word that is conceptually simple.

Vector norms

Before we can look at the H filter problem we need to define the norm of a vector. A vector norm is defined as follows:[4]

In order to avoid the square root sign in the rest of this article, we will use the square of the norm:

So, for example, if the vector x is given as:

then the norm squared of x is derived as follows:

A weighted vector norm is slightly more complicated, and is defined as:

where Q is any matrix with compatible dimensions. So, for example, suppose we have the vector x given previously, and the matrix Q is given as:

This Q matrix is called a diagonal matrix. That means that only the entries along the main diagonal are nonzero. In this case, the Q-weighted norm of x is derived as:

We see that the elements of Q serve to weight the elements of x when the norm is computed; hence the term weighted norm. In general, Q could be a nondiagonal matrix, but in this article we can assume that Q is diagonal.

H filtering

Now that we know what a weighted vector norm is, we can state the H filter problem and solution. First of all, suppose we have a linear dynamic system as defined at the beginning of this article. Then suppose we want to solve the following problem:

where J is some measure of how good our estimator is. We can view the noise terms w and v as adversaries that try to worsen our estimate. Think of w and v as manifestations of Murphy's Law: they will be the worst possible values. So, given the worst possible values of w and v, we want to find a state estimate that will minimize the worst possible effect that w and v have on our estimation error. This is the problem that the H filter tries to solve. For this reason, the H filter is sometimes called the minimax filter; that is, it tries to minimize the maximum estimation error. We will define the function J as follows:

where the averages are taken over all time samples k. In other words, we want to find an that minimizes J, so we want to find an that is as close to x as possible. Murphy wants to maximize J, so Murphy wants to find noise sequences w and v that cause our estimate to be far from the true state x. However, Murphy could make far from x simply by making the noise very large. We prevent this by putting the average of the weighted norms of w and v in the denominator of J. That way, Murphy has to find noise terms that will make J large without simply using brute force.

The previous equation is the statement of the H filtering problem. Our task is to find a state estimate that makes J small even while Murphy is finding noise terms that make J large. The Q, W, and V matrices that are used in the weighted norms in J are chosen by us, the designers, to obtain desired trade-offs. For example, if we know that the w noise will be smaller than the v noise, we should make the W matrix smaller than the V matrix. This will de-emphasize the importance of the w noise relative to the v noise. Similarly, if we are more concerned about estimation accuracy in specific elements of the state vector x, or if the elements of the state vector x are scaled so that they differ by an order of magnitude or more, then we should define the Q matrix accordingly.

It turns out that the estimation problem is too difficult to solve mathematically. However, we can solve a related problem:

where g is some constant number chosen by us. That is, we can find a state estimate so that the maximum value of J is always less than 1/g, regardless of the values of the noise terms w and v. The state estimate that forces J < 1/g="" is="" given="" as="" follows:="">

These are the H filter equations. They have the same form as the Kalman filter equations, but the details are different.

I is the identity matrix, which consists entirely of zeros, except for the main diagonal, which is made up of ones. For instance, a 333 identity matrix would look like this:

Kk is the H gain matrix. The initial state estimate should be initialized to our best guess of x0 , and the initial value P0 should be set to give acceptable filter performance. In general, P0 should be small if we are highly confident of our initial state estimate . If we use these H filter equations, we guarantee that J < 1/g.="" that="" means="" that="" no="" matter="" what="" murphy="" does="" with="" the="" noise="" terms="" w="" and="" v,="" the="" ratio="" of="" the="" estimation="" error="" to="" the="" noise="" will="" always="" be="" less="" than="" 1/g.="">

Well then, let's just set g to a very large number, say a zillion or so! That way we can guarantee that our estimation error is almost zero, since 1/g will be almost zero. Well, we can't quite do that. The mathematical derivation of the H equations is valid only if g is chosen such that all of the eigenvalues of the P matrix have magnitudes less than one. If we choose too large of a g, a solution to the H filtering problem does not exist. That is, we cannot find an estimator that will make the estimation error arbitrarily small.

Matrix inversion!?

The observant reader will notice that a matrix inversion is required at every time step, in order to compute the Lk terms. As discussed in the Kalman filter article, the inversion of small matrices is fairly easy, but the inversion of a matrix larger than 333 could require too much computational time for a practical implementation. Fortunately, this problem is not insurmountable. Notice from the H filter equations that the calculation of Lk , Pk , and Kk can be performed offline. That is, we do not need the measurements to compute the H gain matrix Kk . We can compute Kk (on our development system) and then hard-code Kk into our embedded system.

This solution, however, just shifts the problem to another arena. Now, instead of being strapped for throughput, we are out of memory. If we try to store a Kk matrix in our embedded system for each time step of our program execution, we will quickly run out of memory. But again, a solution exists. When we compute the Kk matrices, we notice that they very quickly approach what is called a steady state solution. Although Kk is written as a function of the time step k, the matrices Kk will be constant after just a few time steps. So we can run the H gain equations off line until we see Kk converge to a constant matrix, and then simply hard-code that constant matrix into our embedded system. The result is called the steady-state H filter.

As an aside, notice that we can do the same thing with the Kalman filter equations shown at the beginning of this article. We can compute the Kalman gain matrix Kk offline until we see it converge to a constant matrix. Then we can hard-code that constant matrix into our embedded Kalman filter. The result is called the steady-state Kalman filter.

But does the steady-state H filter perform as well as the time-varying H filter? And does the steady-state Kalman filter perform as well as the time-varying Kalman filter? We can save computational resources by using the steady-state versions of the filter, but how much performance will we sacrifice? The answer depends on the application, but often we lose very little performance. In many cases, the drop in performance will be negligible. We will see this in the simulation results that follow.

Vehicle navigation

To demonstrate the H filter, we will consider a vehicle navigation problem, similar to the one that we used for our Kalman filter example. Say we have a vehicle traveling along a road, and we are measuring its velocity (instead of position). The velocity measurement error is 2 feet/sec (one sigma), the acceleration noise (due to potholes, gusts of wind, motor irregularities, and so on) is 0.2 feet/sec2 (one sigma), and the position is measured 10 times per second. The linear model that represents this system is similar to the model discussed in the Kalman filter article. The xk+1 equation is the same, but the yk equation is different. In the Kalman filter article, we measured position, and in this article we are measuring velocity.

The state vector x is composed of vehicle position and velocity, u is the known commanded vehicle acceleration, and w is acceleration noise. The measured output y is the measured velocity corrupted by measurement noise z. As explained in the Kalman filter article, we derive the Sz and Sw matrices used in the Kalman filter equations as follows:

It looks, though, as if a fly has landed in the ointment. We thought that the velocity measurement noise was Gaussian with a standard deviation of 2 feet/sec. In reality, the measurement noise is uniformly distributed between 1 foot/sec. That is, the true measurement noise is a random number with all values between 1 foot/sec equally likely. In addition, we thought that the acceleration noise had a standard deviation of 0.2 feet/sec2 . In reality, the acceleration noise is twice as bad as we thought. It actually has a standard deviation of 0.4 feet/sec2 . This is because we took the road less traveled, and it has more potholes than expected.

When we simulate the Kalman filter and the H filter (g=0.01) for this problem, we obtain the results shown in Figures 1 and 2 for the velocity estimation error and the position estimation error. Figure 1 shows that the velocity error is noticeably less for the H filter than for the Kalman filter. Figure 2 shows that the position error is also much better with the H filter. The Kalman filter does not perform as well as we expect because it is using Sz and Sw matrices that do not accurately reflect reality. The H filter, on the other hand, makes no assumptions at all about the noise w and z, except that nature is throwing the worst possible noise at our estimator.

Next, we consider the steady-state H filter. Figure 3 shows the H filter gain matrix Kk as a function of time for the first five seconds of the simulation. Since we are estimating two states, and we have one measurement, Kk has two elements. Figure 3 shows that the elements of Kk quickly approach the approximate steady state values Kk (1) = 0.11 and Kk (2) = 0.09.

Let's pretend that we are poor struggling embedded systems engineers who can't afford a fancy enough microcontroller to perform real-time matrix inversions. To compensate, we simulate the steady-state H filter using the hard-coded gain matrix:

The results are practically indistinguishable from the time-varying H filter. We can avoid all the work associated with real-time matrix inversion and pay virtually no penalty in terms of performance. The Matlab code that I used to generate these results is available at If you use Matlab to run the code you will get different results every time because of the random noise that is simulated. Sometimes the Kalman filter will perform even better than the H filter. But the H filter generally outperforms the Kalman filter for this example.

Listing 1: H filter equations

float CT[n][r]; // This is the matrix CT
float CTVinv[n][r]; // This is the matrix CT*V-1
float CTVinvC[n][n]; // This is the matrix CT*V-1*C
float CTVinvCP[n][n]; // This is the matrix CT*V-1*C*P
float QP[n][n]; // This is the matrix Q*P;
float QPgamma[n][n]; // This is the matrix Q*P*gamma
float EYEQPgamma[n][n]; // This is the matrix EYE-Q*P*gamma
float Linv[n][n]; // This is the inverse of the L matrix
float L[n][n]; // This is the L matrix
float AP[n][n]; // This is the matrix A*P
float APL[n][n]; // This is the matrix A*P*L
float APLCT[n][r]; // This is the matrix A*P*L*CT
float K[n][r]; // This is the H- gain matrix
float Cxhat[r][1]; // This is the vector C*xhat
float yCxhat[r][1]; // This is the vector y-C*xhat
float KyCxhat[n][1]; // This is the vector K*(y-C*xhat)
float Axhat[n][1]; // This is the vector A*xhat
float Bu[n][1]; // This is the vector B*u
float AxhatBu[n][1]; // This is the vector A*xhat+B*u
float AT[n][n]; // This is the matrix AT
float APLAT[n][n]; // This is the matrix A*P*L*AT

// The following sequence of function calls computes the L matrix.
MatrixTranspose((float*)C, r, n, (float*)CT);
MatrixMultiply((float*)CT, (float*)Vinv, n, r, r, (float*)CTVinv);
MatrixMultiply((float*)CTVinv, (float*)C, n, r, n, (float*)CTVinvC);
MatrixMultiply((float*)CTVinvC, (float*)P, n, n, n, (float*)CTVinvCP);
MatrixMultiply((float*)Q, (float*)P, n, n, n, (float*)QP);
MatrixScalarMultiply((float*)QP, gamma, n, n, (float*)QPgamma);
MatrixSubtraction((float*)EYE, (float*)QPgamma, n, n, (float*)EYEQPgamma);
MatrixAddition((float*)EYEQPgamma, (float*)CTVinvCP, n, n, (float*)Linv);
MatrixInversion((float*)Linv, n, (float*)L);

// The following sequence of function calls computes the K matrix.
MatrixMultiply((float*)A, (float*)P, n, n, n, (float*)AP);
MatrixMultiply((float*)AP, (float*)L, n, n, n, (float*)APL);
MatrixMultiply((float*)APL, (float*)CT, n, n, r, (float*)APLCT);
MatrixMultiply((float*)APLCT, (float*)Vinv, n, r, r, (float*)K);

// The following sequence of function calls updates the xhat vector.
MatrixMultiply((float*)C, (float*)xhat, r, n, 1, (float*)Cxhat);
MatrixSubtraction((float*)y, (float*)Cxhat, r, 1, (float*)yCxhat);
MatrixMultiply((float*)K, (float*)yCxhat, n, r, 1, (float*)KyCxhat);
MatrixMultiply((float*)A, (float*)xhat, n, n, 1, (float*)Axhat);
MatrixMultiply((float*)B, (float*)u, n, r, 1, (float*)Bu);
MatrixAddition((float*)Axhat, (float*)Bu, n, 1, (float*)AxhatBu);
MatrixAddition((float*)AxhatBu, (float*)KyCxhat, n, 1, (float*)xhat);

// The following sequence of function calls updates the P matrix.
MatrixTranspose((float*)A, n, n, (float*)AT);
MatrixMultiply((float*)APL, (float*)AT, n, n, n, (float*)APLAT);
MatrixAddition((float*)APLAT, (float*)W, n, n, (float*)P);

Pseudocode for implementing an H filter in a high-level language is shown in Listing 1. This code assumes that the linear system has n states, m inputs, and r outputs. The following variables are assumed:

A: an n3n matrix
B: an n3m matrix
C: an r3n matrix
xhat: an n31 vector
y: an r31 vector
u: an m31 vector
P: an n3n matrix
Vinv: an r3r matrix
W: an n3n matrix
EYE: the n3n identity matrix
Q: an n3n matrix
gamma: a scalar
Infinity and beyond

The simple example shown in this article demonstrates the potential benefits of H filtering. However, H theory is actually much more general than we have discussed here. It can also be used to design control systems (not just estimators) that are robust to unknown noise sources. Our example showed the robustness of H filtering to unknown noise sources. However, the H filtering problem can also be formulated in such a way that the filter is robust to system modeling errors. In our example, we assumed that the system model was given by:

where A and B were known matrices. However, suppose we have some uncertainty in our model. Suppose the system model is given by:

where, as before, A and B are known, but DA and DB are unknown matrices. We may have some bounds on DA and DB, but we do not know their exact values. Then the H filtering problem can be reformulated to minimize the effect of these modeling errors on our estimation error. The H filter can, therefore, be designed to be robust to u

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.