CMP EMBEDDED.COM

Login | Register     Welcome Guest ESC Boston  esc india  Call for Abstracts
 




Finite Word Length Effects on Digital Filter Implementations

by BRAD HUNTING

S ome forms of digital filters are more appropriate than others when real-world effects are considered. This article looks at the effects of finite word length and suggests that some implementation forms are less susceptible to the errors that finite word length effects introduce.

In articles about digital signal processing (DSP) and digital filter design, one thing I've noticed is that after an in-depth development of the filter design, the implementation is often just given a passing nod. References abound concerning digital filter design, but surprisingly few deal with implementation. The implementation of a digital filter can take many forms. Some forms are more appropriate than others when various real-world effects are considered. This article examines the effects of finite word length. It suggests that certain implementation forms are less susceptible than others to the errors introduced by finite word length effects.

Finite word length

Most digital filter design techniques are really discrete time filter design techniques. What's the difference? Discrete time signal processing theory assumes discretization of the time axis only. Digital signal processing is discretization on the time and amplitude axis. The theory for discrete time signal processing is well developed and can be handled with deterministic linear models. Digital signal processing, on the other hand, requires the use of stochastic and nonlinear models. In discrete time signal processing, the amplitude of the signal is assumed to be a continuous value-that is, the amplitude can be any number accurate to infinite precision. When a digital filter design is moved from theory to implementation, it is typically implemented on a digital computer. Implementation on a computer means quantization in time and amplitude-which is true digital signal processing. Computers implement real values in a finite number of bits. Even floating-point numbers in a computer are implemented with finite precision-a finite number of bits and a finite word length. Floating-point numbers have finite precision, but dynamic scaling afforded by the floating point reduces the effects of finite precision. Digital filters often need to have real-time performance-that usually requires fixed-point integer arithmetic. With fixed-point implementations there is one word size, typically dictated by the machine architecture. Most modern computers store numbers in two's complement form. Any real number can be represented in two's complement form to infinite precision, as in Equation 1:

where bi is zero or one and Xm is scale factor. If the series is truncated to B+1 bits, where b0 is a sign bit, there is an error between the desired number and the truncated number. The series is truncated by replacing the infinity sign in the summation with B, the number of bits in the fixed-point word. The truncated series is no longer able to represent an arbitrary number-the series will have an error equal to the part of the series discarded. The statistics of the error depend on how the last bit value is determined, either by truncation or rounding. Coefficient Quantization The design of a digital filter by whatever method will eventually lead to an equation that can be expressed in the form of Equation 2:
with a set of numerator polynomial coefficients bi, and denominator polynomial coefficients ai. When the coefficients are stored in the computer, they must be truncated to some finite precision. The coefficients must be quantized to the bit length of the word size used in the digital implementation. This truncation or quantization can lead to problems in the filter implementation. The roots of the numerator polynomial are the zeroes of the system and the roots of the denominator polynomial are the poles of the system. When the coefficients are quantized, the effect is to constrain the allowable pole zero locations in the complex plane. If the coefficients are quantized, they will be forced to lie on a grid of points similar to those in Figure 1. If the grid points do not lie exactly on the desired infinite precision pole and zero locations, then there is an error in the implementation. The greater the number of bits used in the implementation, the finer the grid and the smaller the error. So what are the implications of forcing the pole zero locations to quantized positions? If the quantization is coarse enough, the poles can be moved such that the performance of the filter is seriously degraded, possibly even to the point of causing the filter to become unstable. This condition will be demonstrated later.

Rounding Noise

When a signal is sampled or a calculation in the computer is performed, the results must be placed in a register or memory location of fixed bit length. Rounding the value to the required size introduces an error in the sampling or calculation equal to the value of the lost bits, creating a nonlinear effect. Typically, rounding error is modeled as a normally distributed noise injected at the point of rounding. This model is linear and allows the noise effects to be analyzed with linear theory, something we can handle. The noise due to rounding is assumed to have a mean value equal to zero and a variance given in Equation 3:

For a derivation of this result, see Discrete Time Signal Processing.1 Truncating the value (rounding down) produces slightly different statistics. Multiplying two B-bit variables results in a 2B-bit result. This 2B-bit result must be rounded and stored into a B-bit length storage location. This rounding occurs at every multiplication point.

Scaling We don't often think about scaling when using floating-point calculations because the computer scales the values dynamically. Scaling becomes an issue when using fixed-point arithmetic where calculations would cause over- or under flow. In a filter with multiple stages, or more than a few coefficients, calculations can easily overflow the word length. Scaling is required to prevent over- and under flow and, if placed strategically, can also help offset some of the effects of quantization.

Signal Flow Graphs Signal flow graphs, a variation on block diagrams, give a slightly more compact notation. A signal flow graph has nodes and branches. The examples shown here will use a node as a summing junction and a branch as a gain. All inputs into a node are summed, while any signal through a branch is scaled by the gain along the branch. If a branch contains a delay element, it's noted by a z ý 1 branch gain. Figure 2 is an example of the basic elements of a signal flow graph. Equation 4 results from the signal flow graph in Figure 2.

Direct form I

A filter design will generally yield a set of filter coefficients which describe a filter as a polynomial numerator over a polynomial denominator, similar to Equation 2. The direct implementation of Equation 2 as a signal flow graph is shown in Figures 3 through 6. The first half of the signal flow graph implements the numerator and the second half of the graph implements the denominator. The transfer function from x to v is the numerator of Equation 2, and the transfer function from v to y, the denominator. This implementation is called the Direct Form I (DFI).

Direct form II

The Direct Form I implementation requires M + N delay elements where M and N are the orders of the numerator and denominator polynomials. A delay element is implemented as a storage register or variable in the computer. Every delay or storage element requires resources to store the element and computation resources to perform the calculation using the elements. The number of delay elements should be minimized where possible. Consider an alternate view of the Direct Form I. The denominator is calculated first, followed by the numerator. From the mathematical point of view, nothing has changed-the results of the calculation are the same. Notice, though, that the values entering each of the delay elements are identical. In this case, the two halves of the graph can be slid together The number of delay elements is then reduced to max(M, N), which could reduce the number of delay elements by 50%. This implementation is known as Direct Form II (DFII), a canonic form commonly used to describe filters. But as pretty as DFII is, it may not be the best form from which to implement a higher-order filter.

Cascade Any polynomial can be represented as a product of second order terms (and at most, one additional first order term). The signal flow graph for a higher order filter can be decomposed into a series of second order DFII sections. This is the form of the equation recommended by Jack Crenshaw in his Programmer's Toolbox series.2 When we're through here, you'll have a better understanding of why this form is best. Parallel The parallel form is the final one. This form can be arrived at by decomposing a higher order polynomial with partial fraction expansion. Implementations Given the different forms available to describe a digital filter, why would one be selected over another? The forms are equivalent under continuous math. Things change, though, when finite word length arithmetic is considered-under finite word length arithmetic, the performance of the various forms differs significantly. noise models Consider a first order single pole filter, as in Equation 5 and Figure 8:

As I mentioned previously, a rounding error occurs after every multiply. This rounding is modeled as a random error injected at the output of the multiplication with a mean of zero and a variance that is given in Equation 3. For the first order system described in Equation 5, the linear noise model of the system is shown in Figure 9. The output equation for this system becomes as follows (Equation 6):
The noise injected at E1 passes through the system and is filtered as if it were noise injected at the input. The noise injected at E2 is coupled to the output and passes directly to the output. Given our new knowledge of noise models, we can now look at the higher order implementation forms. We'll consider only two forms, for lack of space. First consider the Direct Form I. Figure 10. shows the model for a second order DFI filter with linear noise sources modeled at the output of each of the five multiplies. The noise sources E0 through E4 can be considered separately or combined at EB. The direct path from each noise source to the output yields the noise contribution to the output signal with a mean of zero and a variance given in Equation 7:
where M is the order of the numerator polynomial and N is the order of the denominator polynomial. Consider a Direct Form II implementation. Figure 11 shows a DFII implementation with noise sources modeled. The noise contribution to the output is given in Equation 8:
where h[n] is the filter impulse response. The noise generated by the zero calculations is still directly coupled to the output, and the noise generated by the pole calculations is filtered by the entire system. A DFII implementation will not necessarily have a lower noise output than a DFI, but it's true that part of the noise is filtered by the system in a DFII implementation. The actual response will depend on the properties of the noise and the filter characteristics and is most easily determined by simulation.

Digital Filters

Digital filtering is a typical application of DSP. Consider a tenth order Chebyshev low pass filter. A variety of sources discuss the development of Chebyshev filters, but I won't go into that here. I used Matlab3 to develop the following examples. Matlab has a variety of libraries for signal processing and filter design. I used the command [b,a] = cheby1(10, 0.1, 0.25) to generate the filter coefficients in Table 1. The ideal frequency response, as calculated by Matlab with double precision floating-point accuracy, is shown in Figure 12. Matlab can easily calculate the pole zero locations and the magnitude of the pole positions. For a digital filter to be stable, all of the poles of the characteristic equation must be within the unit circle. The magnitude of the roots of the denominator equation must be less than one. The roots and root magnitudes of the unquantized characteristic equation are shown in the first two columns of Table 2. A plot of the pole zero locations is shown in Figures 13 and 14.From the position of the poles, we can see that this is a stable filter when using double precision math. Consider rounding the pole coefficients to three decimal places, as in Table 3. This still leaves five significant decimal places, about 13 or 14 bits. This is a plot of the quantized pole zero locations.

The system is now clearly unstable. Simply rounding the filter coefficients has significantly degraded the filter performance. This situation can be observed mathematically by calculating the roots of the quantized characteristic equation. The roots of the unquantized and quantized systems are shown in Table 2. To maintain stability, the magnitude of the complex roots must be less than one. Table 2 illustrates that the magnitude of the roots of the unquantized system remain less than one, but the quantized root magnitudes exceed one. This condition verifies the conclusion: the system is unstable under quantization. Can anything be done about this instability without resorting to long word lengths or floating-point arithmetic? Different implementation forms exhibit different sensitivities to quantization. Rewrite the system in Equation 2 as a product of second order systems, as in Equation 10:

In Equation 10, k is equal to the number of second order sections required to implement the filter. There would be an additional first order section for filters of odd powers. Now consider quantizing the factored coefficients to three decimal places-about 10 bits. The unquantized and quantized roots for the example filter are given in Table 4 along with their magnitudes. Figure 15.is a plot of the pole zero locations resulting from the quantized roots. This system is now obviously stable, even when using quantized coefficients. What does this imply for the implementation? The product of second order terms can be implemented as a cascade of second order sections. The astute reader of Crenshaw's columns will recall the advice to break a higher order filter into second and first order sections. A practical reason for this approach is to minimize the effects of coefficient quantization. Note that the performance of the filter typically degrades as the coefficients are quantized to fewer and fewer bits, even to the point of filter instability. This degradation is true for any of the implementation forms if the word length is too short. determining performance

Filters which look identical on paper and are in fact equivalent under continuous math can offer drastically differing performances when implemented under finite precision math. The different forms exhibit different noise susceptibilities and different performance degradation with quantized coefficients. As a rule of thumb, a cascade of second and first order sections is going to give the best immunity to the effects of coefficient quantization. In the end, simulation of the filter is likely to give the most insight into the filter's performance. As I've suggested previously, the use of a numeric manipulation tool such as Matlab is helpful when designing and testing filters. These tools will solve for the filter coefficients for a variety of filters, simulate performance using double precision math, and simulate the performance of quantized filters. The packages typically have a variety of plotting capabilities that enable visualization of filter performance and pole zero locations. Brad Hunting's industrial experience is in the area of real-time embedded controls and embedded small area networks. He is currently finishing his doctoral degree in mechatronic engineering at Rensselaer Polytechnic Institute. He can be reached at huntib@rpi.edu, or via the Web at cat.rpi.edu/~huntib.

References

1. Oppenheim, Alan and Ronald Schafer. Discrete Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989.
2. Crenshaw, Jack, Programmer's Toolbox,- Embedded Systems Programming, May 1996 to November 1996.
3. Matlab, The MathWorks, Inc., Natick, Mass, 01760.

Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS


 :