It appears that no particular approximate [nonlinear] filter is consistently better than any other, though . . . any nonlinear filter is better than a strictly linear one.1
The Kalman filter is a tool that estimates the variables of a wide range of processes. In mathematical terms we'd say that a Kalman filter estimates the states of a linear system. There are two reasons you might want to know the states of a system, whether linear or nonlinear:
- First, you might need to estimate states in order to control the system. For example, the electrical engineer needs to estimate the winding currents of a motor in order to control its position. The aerospace engineer needs to estimate the velocity of a satellite in order to control its orbit. The biomedical engineer needs to estimate blood-sugar levels in order to regulate insulin injection rates.
- Second, you might need to estimate system states because they're interesting in their own right. For example, the electrical engineer needs to estimate power-system parameters in order to predict failure probabilities. The aerospace engineer needs to estimate satellite position in order to intelligently schedule future satellite activities. The biomedical engineer needs to estimate blood-protein levels to evaluate the health of a patient.
The standard Kalman filter is an effective tool for estimation, but it's limited to linear systems. Most real-world systems are nonlinear, in which case Kalman filters don't directly apply. In the real world, nonlinear filters are used more often than linear filters because real systems are nonlinear. In fact, the very first use of Kalman filters involved nonlinear Kalman filters in NASA's space program in the 1960s. This article will give you the basic concepts you need to know to design and implement a nonlinear Kalman filter. I'll also illustrate the use of nonlinear Kalman filters by looking at a motor example.
Review of Kalman filters
I wrote another article in this magazine about Kalman filters a few years ago, but I'll review the idea here for those who don't have their back issues handy.2
If we want to use a standard Kalman filter to estimate a signal, the process we're measuring has to be describable by linear system equations. A linear system is a process that can be described by the following two equations:
These equations define a linear system because they don't contain any exponential functions, trigonometric functions, or any other functions that wouldn't make a straight line when plotted on a graph. Equations 1 and 2 have several variables:
- A , B , and C are matrices
- k is the time index
- x is called the state of the system
- u is a known input to the system (called the control signal)
- y is the measured output
- w and v are the noise–w is called the process noise, and v is called the measurement noise
Other than the time index, each of these variables are (in general) vectors and therefore contain more than one element.
In state-estimation problems, we want to estimate x because it contains all of the information about the system. The problem is, we can't measure x directly. Instead we measure y , which is a function of x that's corrupted by the noise of v . We can use y to help us obtain an estimate of x , but we can't necessarily take the information from y at its face value because it's corrupted by noise.
As an example, suppose that our system is a tank, a mobile robot, a pizza-delivery car, or some other vehicle moving in a straight line. We can say that the state consists of the vehicle position and velocity. The input u is the acceleration and the output y is the measured position. Let's further suppose that we're able to measure the position every T seconds. This system can be described thus:
xk is a vector that contains vehicle position and velocity at time k , uk is a scalar that is equal to the acceleration, and yk is a scalar that is equal to the measured position.3 wk is a vector that has process noise due to potholes, uncertainties in our knowledge of uk , and other unmodeled effects. Finally, vk is a scalar that's equal to the measurement noise (that is, instrumentation error).
Now suppose that we want to control the vehicle to follow a specific path, or we want to estimate the vehicle position for some other reason. We could just use yk as our position estimate, but yk is noisy. We could do better by using a Kalman filter. This is because a Kalman filter uses not only the position measurement yk , but also the information contained in the state equation. The Kalman filter equations can be written like this:4
where the time step k = 0, 1, 2, Once again, the Kalman filter is called a linear filter because the equation doesn't contain any exponential functions, trigonometric functions, or any other functions that would not appear as a straight line on a graph. Here are the meanings of the variables in the Kalman filter equations:
- k is the estimate of xk
- Kk is called the Kalman gain (it is a matrix)
- Pk is called the estimation-error covariance (also a matrix)
- Q is the covariance of the process noise wk , and R is the covariance of the measurement noise vk (two more matrices)
- The -1 superscript indicates matrix inversion
- The T superscript indicates matrix transposition
- I is the identity matrix
In order to initialize the Kalman filter, we need to start with an estimate 0 of the state at the initial time. We also need to start with an initial estimation error covariance P 0 , which represents our uncertainty in our initial state estimate. If we're very confident in our initial estimate 0 , then P 0 should be small. If we're very uncertain of our initial estimate 0 , then P 0 should be large. In the long run, these initialization values won't make much difference to the filter's performance.
Okay, so the Kalman filter is a linear filter that can be applied to a linear system. Unfortunately, linear systems don't really exist–all systems are ultimately nonlinear. Even the simple I = V/R relationship of Ohm's law is only an approximation over a limited range. If the voltage across a resistor exceeds a certain value, Ohm's law breaks down. Figure 1 shows a typical relationship between the current through a resistor and the voltage across the resistor. At small input voltages the relationship is a straight line, but if the power dissipated by the resistor exceeds some value, the relationship becomes very nonlinear. Even a device as simple as a resistor is only approximately linear, and even then only in a limited range of operation. This illustrates the fact that linear systems do not exist in the real world.
So we see that linear systems don't really exist. However, many systems are close enough that linear estimations (for example, standard Kalman filters) give good results. But “close enough” can only be carried so far. Eventually we'll run across a system that doesn't behave linearly even over a small range of operation, and the standard Kalman filter no longer gives good results. In these cases, we'll need to explore nonlinear filters.
Nonlinear filtering can be difficult and complex; it's certainly not as well understood as linear filtering. However, some nonlinear estimation methods have become (or are becoming) widespread. These include nonlinear extensions of the Kalman filter, “unscented” filtering, and particle filtering, which are explained in my book (see Additional reading at the end of this article).
In this article I'll talk about the two most basic nonlinear extensions of the Kalman filter. The standard Kalman filter I've just summarized doesn't directly apply to nonlinear systems. However, if we linearize a nonlinear system we can use linear-estimation methods to estimate the states. In order to linearize a nonlinear system, we'll use a mathematical tool called Taylor series expansion, coming up next.
Taylor series expansion
The key to nonlinear Kalman filtering is to expand the nonlinear terms of the system equation in a Taylor series expansion5 around a nominal point. A Taylor series expansion of a nonlinear function can be written as:
In Equation 5:
- is the n th derivative of f (x ), evaluated at
Equation 5 looks complicated, but it's really pretty simple. Let's look at an example. Suppose we want to expand f (x ) = cos(x ) in a Taylor series around the point = 0. Remember that the derivative of cos(x ) is “sin(x ), and the derivative of sin(x ) is cos(x ). That means we can write the Taylor series expansion of cos(x ) as:
Since we're expanding cos(x ) around the nominal point x = 0, we see that = 0, and Δx = x – = x . The Taylor series expansion of cos(x ) becomes equal to:
If we use a “second-order” Taylor series expansion of cos(x ), we can say that cos(x ) is approximately equal to 1 – x 2 / 2. (It's called “second-order” because it includes terms up to and including the second power of x .) In other words, we can ignore the rest of the terms in the Taylor series. This is because additional terms in the Taylor series involve higher powers of x that are divided by ever-increasing factorials. If x is small, as we raise x to higher and higher powers and divide by larger and larger factorials, the additional terms in the Taylor series become insignificant compared with the lower order terms.
Try out the second-order Taylor series expansion of cos(x ) for yourself. Table 1 shows cos(x ) and its second-order Taylor series expansion for various values of x . We see that as x gets smaller (that is, as it gets closer to the nominal point = 0), the Taylor series expansion gives a better approximation to the true value of cos(x ).
Linearizing a function means expanding it in a “first-order” Taylor series around some expansion point. In other words, the first-order Taylor series expansion of a function f (x ) is equal to:
Figure 2 shows the function sin(x ) along with its first-order Taylor series expansion around the nominal point = 0.6 Note that for small values of x , the two lines in the figure are quite close, which shows that the Taylor series expansion is a good approximation to sin(x ). But as x gets larger the two lines diverge, so for large values of x , the Taylor series expansion is a poor approximation.
The linearized Kalman filter
Now that we understand Taylor series, we can derive the linearized Kalman filter. The basic idea of the linearized Kalman filter is to start with a nonlinear system and then find a linear system whose states represent the deviations from a nominal trajectory of the nonlinear system. We can then use the Kalman filter to estimate the deviations from the nominal trajectory. This calculation indirectly gives us an estimate of the states of the nonlinear system.
Here's a general nonlinear system model:
The state equation f ( • ) in Equation 9 and the measurement equation h ( • ) in Equation 10 are nonlinear functions. As an example, suppose we have a system that looks like this:
The state equation shown in Equation 11 has two nonlinear terms: a square term and a cosine term. The output equation (Equation 12) is also nonlinear because it contains 1 / xk . If either the state equation or the output equation has nonlinear terms, then the system is called a nonlinear system.
In the linearized Kalman filter, we use a first-order Taylor series to expand the state equation and output equation around a nominal state. The nominal state is a function of time, so it's sometimes called a trajectory . The nominal trajectory is based on a guess of what the system behavior might look like. For example, if the system equations represent the dynamics of an airplane, the nominal state might be the planned flight trajectory. The actual flight trajectory will differ from this nominal trajectory due to mismodeling, disturbances, and other unforeseen effects. But we hope the actual trajectory will be close to the nominal trajectory, in which case the Taylor series linearization should be reasonably accurate. The Taylor series' linearizations of the nonlinear system equation and output equation give us:
Now notice that the deviations from the nominal trajectory can be written as:
Combining Equations 13 and 14 gives:
Now look at what we've accomplished–we have state and output equations that are linear functions of x . That means that we can use a standard Kalman filter to estimate Δx .
There are two important points to remember when using the linearized Kalman filter:
- After the Kalman filter is used to estimate Δx , we need to add the estimate of Δx to the nominal state in order to get an estimate of the state x . This is because Δx = x –.
- If the true state x gets too far away from the nominal state , then the linearized Kalman filter will not give good results. This is because the Taylor series approximation breaks down if x gets too far away from (Figure 2 illustrates this point).
Here's a summary of the linearized Kalman filter algorithm:
- The system equations are given as:
- The nominal trajectory is known ahead of time:
- At each time step, compute the following partial derivative matrices, evaluated at the nominal state:
The derivatives in the above equation are taken with respect to xk .An example of partial derivative matrices is given later in this article.
- Define Δyk as the difference between the actual measurement yk and the nominal measurement :
- Execute the following Kalman filter equations:
The extended Kalman filter
The linearized Kalman filter we derived is fine as far as it goes but has a problem: we need to know the nominal trajectory ahead of time. For some systems we might know a nominal trajectory ahead of time (for example, an aircraft flight with a predetermined flight plan or a robot arm moving in a manufacturing environment), but for other systems we may have no way of knowing a nominal trajectory.
The idea of the EKF (extended Kalman filter) is to use our estimate of x as the nominal trajectory in the linearized Kalman filter. In other words, we set equal to in the linearized Kalman filter. This is a clever bootstrapping approach to state estimation; we use a nominal trajectory to estimate x and use the estimated value of x as the nominal trajectory. After making these substitutions into the linearized Kalman filter equations and going through some mathematical manipulations, we get the following EKF algorithm:
- The system equations are given as:
- At each time step, compute the following derivative matrices, evaluated at the current state estimate:
Note that the derivatives are taken with respect to xk and evaluated at xk =k .
- Execute the following Kalman filter equations:
Motor state estimation
To illustrate the use of the EKF, let's use it to estimate the states of a two-phase permanent magnet synchronous motor. We might want to estimate the states so that we can regulate them with a control algorithm, or we might want to estimate the states because we want to know the position or velocity of the motor for some other reason. Let's suppose that we can measure the motor winding currents, and we want to use the EKF to estimate the rotor position and velocity. The system equations are:
The variables in these equations are defined as follows:
- Ia and Ib are the currents in the two motor windings
- θ and ω are the angular position and velocity of the rotor
- R and L are the motor winding's resistance and inductance
- λ is the flux constant of the motor
- F is the coefficient of viscous friction that acts on the motor shaft and its load
- J is the moment of inertia of the motor shaft and its load
- ua and ub are the voltages that are applied across the two motor windings
- Δua and Δub are noise terms due to errors in ua and ub
- Δα is a noise term due to uncertainty in the load torque
- y is the measurement. We're assuming here that we have measurements of the two winding currents, maybe using sense resistors. The measurements are distorted by measurement noises va and vb , which are caused by events such as sense-resistance uncertainty, electrical noise, and quantization errors in our microcontroller.
If we want to apply an EKF to the motor, we need to define the states of the system. You can see the states by looking at the system equations and noting wherever a derivative appears. If a variable is differentiated in the system equations, that quantity is a state. So we see from the motor equations shown in Equation 26 that our system has four states, and the state vector x can be defined as:
The system equation is obtained by “discretizing” the differential equations to obtain the equation in Equation 28 where Δt is the step size that we're using for estimation in our microcontroller or DSP.
To use an EKF, we need to find the derivatives of f (xk , uk ) and h (xk ) with respect to xk . This is a new twist, because f (xk , uk ) and h (xk ) are both vector functions of the vector xk . How can we find the derivative of a vector with respect to another vector? If you already know how to take derivatives, it's really not too difficult to find the derivative of a vector. For example, if we write the vectors f and x as:
then the derivative of f with respect to x is equal to the 4 × 4 matrix, shown in Equation 30:
This matrix can be generalized for any size vectors f and x . With this background, we can now find the derivative matrices as shown in Equation 31:
Let's simulate the EKF to see how well we can estimate rotor position and velocity. We'll assume that the measurement noise terms, vak and vbk , are zero-mean random variables with standard deviations equal to 0.1 amps. The control inputs (winding voltages) are equal to:
Equation 32 means that in discrete time, the control inputs are equal to:
The voltages that are applied to the winding currents are equal to these values plus uak and ubk , which we'll suppose are zero-mean random variables with standard deviations equal to 0.001 amps. We'll also assume that the noise due to load torque disturbances αk has a standard deviation of 0.05rad/sec2 . Even though our measurements consist only of the winding currents, we can use an EKF to estimate the rotor position and velocity. I used Matlab to simulate the motor system and the EKF. Listing 1 shows the code and Figure 3 shows the simulation results. We see that the rotor position and velocity are estimated quite well.
Listing 1: Matlab-based Kalman filter simulation code
function Motor % Discrete-time extended Kalman filter simulation for two-phase % step motor. Estimate the stator currents, and the rotor position % and velocity, on the basis of noisy measurements of the stator % currents. Ra = 2; % Winding resistanceL = 0.003; % Winding inductancelambda = 0.1; % Motor constantJ = 0.002; % Moment of inertiaB = 0.001; % Coefficient of viscous friction ControlNoise = 0.001; % std dev of uncertainty in control inputs (amps)AccelNoise = 0.05; % std dev of shaft acceleration noise (rad/sec^2) MeasNoise = 0.1; % standard deviation of measurement noise (amps)R = [MeasNoise^2 0; 0 MeasNoise^2]; % Measurement noise covariancexdotNoise = [ControlNoise/L; ControlNoise/L; 0.5; 0];% Define the continuous-time process noise covariance QQ = [xdotNoise(1)^2 0 0 0; 0 xdotNoise(2)^2 0 0; 0 0 xdotNoise(3)^2 0; 0 0 0 xdotNoise(4)^2]; P = 1*eye(4); % Initial state estimation covariancedt = 0.0002; % Simulation step size (seconds)T = 0.001; % how often measurements are obtainedtf = 2; % Simulation length x = [0; 0; 0; 0]; % Initial statexhat = x; % Initial state estimatew = 2 * pi; % Control input frequency (rad/sec)Q = Q * T; % discrete-time process noise covariance dtPlot = 0.01; % How often to plot resultstPlot = -inf; % Most recent time at which data was saved for plotting % Initialize arrays for plotting at the end of the programxArray = ; % true statexhatArray = ; % estimated statetrPArray = ; % trace of estimation error covariancetArray = ; % time array % Begin simulation loopfor t = 0 : T : tf-T+eps y = [x(1); x(2)] + MeasNoise * randn(2,1); % noisy measurement if t >= tPlot + dtPlot % Save data for plotting tPlot = t + dtPlot - eps; xArray = [xArray x]; xhatArray = [xhatArray xhat]; trPArray = [trPArray trace(P)]; tArray = [tArray t]; end % System simulation for tau = 0 : dt : T-dt+eps time = t + tau; ua = sin(w*time); ub = cos(w*time); xdot = [-Ra/L*x(1) + x(3)*lambda/L*sin(x(4)) + ua/L; -Ra/L*x(2) - x(3)*lambda/L*cos(x(4)) + ub/L; -3/2*lambda/J*x(1)*sin(x(4)) + ... 3/2*lambda/J*x(2)*cos(x(4)) - B/J*x(3); x(3)]; xdot = xdot + xdotNoise .* randn(4,1); x = x + xdot * dt; % rectangular integration x(4) = mod(x(4), 2*pi); % keep the angle between 0 and 2*pi end ua = sin(w*t); ub = cos(w*t); % Compute the partial derivative matrices A = [-Ra/L 0 lambda/L*sin(xhat(4)) xhat(3)*lambda/L*cos(xhat(4)); 0 -Ra/L -lambda/L*cos(xhat(4)) xhat(3)*lambda/L*sin(xhat(4)); -3/2*lambda/J*sin(xhat(4)) 3/2*lambda/J*cos(xhat(4)) -B/J ... -3/2*lambda/J*(xhat(1)*cos(xhat(4))+xhat(2)*sin(xhat(4))); 0 0 1 0]; C = [1 0 0 0; 0 1 0 0]; % Compute the Kalman gain K = P * C' * inv(C * P * C' + R); % Update the state estimate deltax = [-Ra/L*xhat(1) + xhat(3)*lambda/L*sin(xhat(4)) + ua/L; -Ra/L*xhat(2) - xhat(3)*lambda/L*cos(xhat(4)) + ub/L; -3/2*lambda/J*xhat(1)*sin(xhat(4)) + ... 3/2*lambda/J*xhat(2)*cos(xhat(4)) - B/J*xhat(3); xhat(3)] * T; xhat = xhat + deltax + K * (y - [xhat(1); xhat(2)]); % keep the angle estimate between 0 and 2*pi xhat(4) = mod(xhat(4), 2*pi); % Update the estimation error covariance. % Don't delete the extra parentheses (numerical issues). P = A * ((eye(4) - K * C) * P) * A' + Q; end % Plot the resultsclose all;figure; set(gcf,'Color','White'); subplot(2,2,1); hold on; box on;plot(tArray, xArray(1,:), 'b-', 'LineWidth', 2);plot(tArray, xhatArray(1,:), 'r:', 'LineWidth', 2)set(gca,'FontSize',12); ylabel('Current A (Amps)'); subplot(2,2,2); hold on; box on;plot(tArray, xArray(2,:), 'b-', 'LineWidth', 2);plot(tArray, xhatArray(2,:), 'r:', 'LineWidth', 2)set(gca,'FontSize',12); ylabel('Current B (Amps)'); subplot(2,2,3); hold on; box on;plot(tArray, xArray(3,:), 'b-', 'LineWidth', 2);plot(tArray, xhatArray(3,:), 'r:', 'LineWidth', 2)set(gca,'FontSize',12); xlabel('Time (Seconds)'); ylabel('Speed (Rad/Sec)'); subplot(2,2,4); hold on; box on;plot(tArray, xArray(4,:), 'b-', 'LineWidth', 2);plot(tArray,xhatArray(4,:), 'r:', 'LineWidth', 2)set(gca,'FontSize',12); xlabel('Time (Seconds)'); ylabel('Position (Rad)');legend('True', 'Estimated'); figure;plot(tArray, trPArray); title('Trace(P)', 'FontSize', 12);set(gca,'FontSize',12); set(gcf,'Color','White');xlabel('Seconds'); % Compute the standard deviation of the estimation errorsN = size(xArray, 2);N2 = round(N / 2);xArray = xArray(:,N2:N);xhatArray = xhatArray(:,N2:N);iaEstErr = sqrt(norm(xArray(1,:)-xhatArray(1,:))^2 / size(xArray,2));ibEstErr = sqrt(norm(xArray(2,:)-xhatArray(2,:))^2 / size(xArray,2));wEstErr = sqrt(norm(xArray(3,:)-xhatArray(3,:))^2 / size(xArray,2));thetaEstErr = sqrt(norm(xArray(4,:)-xhatArray(4,:))^2 / size(xArray,2));disp(['Std Dev of Estimation Errors = ',num2str(iaEst