Multiresolution Signal Analysis and Wavelet Decomposition
by Don Morgan
Wavelets provide new capabilities for analyzing real-time signals. This introductory article provides an overview and presents the basic mechanisms involved in wavelets.
In many signal processing applications, it is only necessary to know the form and content of a signal. An example of this minimal knowledge might be the DTMF decoding in touch tone phones. You only need to know the two frequencies responsible for a particular signal to determine which button was pushed. This processing is generally done with a Goertzel or Fourier algorithm and essentially generates a snapshot of the spectrum (in the case of the Fourier transform), or selected frequencies (in the case of the Goertzel convolution). This form of analysis is used everywhere, including machine vision for pattern recognition, radio waves, and chemical analysis.
Often, a single snapshot of an event or phenomenon is not enough to analyze it completely. No single frame from a motion picture tells the whole story; neither would a group picture of the actors, locations, and props used. Any number of storylines could be inferred from such pictures.
Think of your favorite tune. Would you recognize it among others if all you were offered were lists of the component frequencies contained in the current top 10 songs? In fact, you∫d probably find that quite a number of melodies use the very same notes.
Some forms of analysis require a knowledge of the evolution of a signal to make sense. This situation is increasingly the case in many fields of communication involving speech, music, and video. As the Internet matures, people are sending video and sound directly over the phone lines, which requires the ability to analyze and compress signals along with the ability to reconstruct that signal perfectly at the other end. The speed with which we can transmit the signal is important.
Similarly, sound in the cinema must be digitized and compressed so that it can be put on the media (laser disk or actual film) along with the pictures. Again, the compression techniques must be good, so that the audio can be exactly recreated from the bit stream off the film or disk. When dealing with transient effects to recreate or analyze these signals, we need to know more about the history of the signals–that is, their evolution.
Both stationary and nonstationary signals are involved. A nonstationary process is one that evolves with time; it might be speech, music, or the vibration schema of an aircraft wing as the plane warms. A nonstationary process can be anything that changes. This kind of process requires a history–a “time when” to be analyzed. A simple Fourier transform cannot give you that history. If you took your favorite song as a sampled data sequence and performed a Fourier transform on it, you would indeed have a spectrum displaying the frequency content of the piece, but that spectrum would be of little help in analyzing the piece.
For the purposes of audio and video analysis, the signal must be decomposed as it evolves. If the signal is to be compressed, the redundancies must be removed from the stream so that it can be transmitted compactly while still providing everything necessary to reconstruct the original signal. Most people are aware of this process in terms of data compression, as is done with data on a hard disk. This process is no different in audio and video environments–the ability to reconstruct the data must be perfect.
Because analysis can involve compression, there will be some discussion of this topic. However, the emphasis of this article will be on those methods used for analysis and ultimately for reconstruction of signals such as those in the audio and video environments. Presently, Fourier and Fourier type transforms are used to determine the content of information and help reduce it for communications and compression. Wavelets offer a new approach that remedies many of the problems inherent in the Fourier transform.
The emphasis here will be on building a foundation for the understanding of wavelet and multi-resolution analysis. There is no way to present a thorough and technically complete exposition. Consequently, you∫ll find some shortcuts and some simplification. Even so, this article provides enough information and direction to make the concept of wavelets understandable.
Audio is composed of such frequencies. But if we took your favorite song, reduced it to a single data sequence, and then used the Fourier transform to produce a spectrum representing its frequency content, you would only have a list of the magnitudes of the various frequencies present in the data sequence. This information would probably be of little use in identifying the song, its melody, or much else. The transform would treat the data sequence as a snapshot. You would still not know when a certain frequency appeared, or how long it was evident. In this situation, you have no way to reconstruct the song from this spectrum. This technique would certainly fail as a compression technique for music, speech, or video in most applications. What we require is a knowledge of time.
Even though the song is composed of stationary signals, it is not completely stationary. Actually, the song could not be justifiably called nonstationary, because although for short periods it can be viewed as stationary, it would be identified as speech and video is–that is, as wide sense stationary. Signals that are truly random are called nonstationary. These are stochastic or random processes whose character is determined by an ensemble of time signals. For example, when technicians are installing and calibrating sound equipment in theaters, they set the desired acoustic response by flooding the room with 'pink' noise and reading the sound pressure magnitude. Sound technicians then increase or decrease the gain on any particular frequency band depending on what kind of sound they want for the installation. Pink noise is usually produced by a random number or bit generator and then filtered for a flat spectrum on the instrument being used. Clearly, because pink noise starts out as a random process, any instantaneous reading would not provide a proper reading. Instead, the spectrum is sampled 48 thousand times a second and a moving average is kept for each band–the band centered at 25Hz is averaged with all the other samples in the 25Hz band, and so on. These averages are called ensemble averages.
There is far more to this process, but basically nonstationary signals cannot be quantified in the same manner as stationary. Wide sense stationary signals have much in common with stationary material, but much of the information they convey changes with time, and so they cannot be handled quite so simply as stationary signals.
Because these signals occur everywhere (music, speech, vibrations, stress and so on), engineering has developed techniques to provide some information and ability to work with these signals.
Ideally, a basis would be a short list of descriptors with which we could describe every possible data sequence we could come up with. Again, ideally, this would be a very short list of descriptors, containing all the necessary vectors to span the plane and none that could be created by a simple combination of the others. In this way, any redundancy would be automatically removed by a transformation into the descriptor domain. A set such as this is said to be linearly independent .
In linear algebra, these descriptors are known as vectors, and a set of vectors through combination and scaling can describe any possible sequence we come up with in a particular plane is said to span that plane. The space defined, or spanned , by these vectors has a dimension Rn , where n is equal to the number of vectors spanning that space, or subspace .
Another very useful (but not necessary) quality of this set of vectors is that they are orthogonal to one another–that is, the dot product of a vector and any of the others is 0:
This process is useful because it is self-reciprocal; the same function that maps a vector into a space will map that vector to its original space. A final step that is often taken with a basis is to normalize the vectors to unit norm; this is known as an orthonormal system. An orthonormal system is always linearly independent.
The classic example of this procedure is the R 3 space with a three coordinate basis:
As you can see:
(x ,y ) = 0 (x ,z ) = 0 (y ,z ) = 0
In this article, we will deal exclusively with orthogonal systems.
The Discrete Fourier Transform is written:
The samples are taken to be equally spaced points on the unit circle in the z-plane equal to:
Using the DFT, we can sample a continuous time signal and develop an array of coefficients representing the relative magnitudes of individual frequencies, whose sum is the original signal.
It is not difficult to see how mapping a time domain signal to the frequency domain could result in compression, in that it is often easier to store and transmit a few discrete frequencies than the entire data sequence. Images for machine vision are sometimes stored this way with the transform also used for recognition, which may be quicker than pattern matching. Uses range from mathematical reduction to signal processing. This transform provides an extraordinary tool for analysis, as well as synthesis of stationary or wide-sense stationary signals.
As extraordinary as this transform is, however, it has some limitations. We can use the Fourier transform to find the magnitudes of the various frequency components of a data sequence, but we are limited in resolution by the number of samples. This situation means if a major component of frequency does not fall directly on a sample it will leak on both sides, becoming a hump rather than a peak, thereby skewing the spectrum. The fix is often to use more samples–the more samples the greater the frequency resolution. But each time we increase the number of samples, we reduce our resolution in time. This relationship states that defining a window of time means that we are also defining its bandwidth as 1/t .
A short data window means that the bandwidth of the spectral components is wide. Consequently, we have a problem getting both frequency resolution and time resolution at the same time–a manifestation of the Heisenberg uncertainty principle.
The center of gravity of a basis function such as the Fourier transform, whether it is time related or frequency related , is a point of concentration of the function. This procedure defines what is known as a tile in the time-frequency domain. It defines a spread that contains 90% of the function's energy. Time localization, or for that matter, frequency localization of a basis function is directly related to its spread in time or frequency.
As far as the Fourier transform is concerned any change in τ will only move the tile in the time domain, and, in the same manner, a change in ω , modulation, moves the tile on the frequency axis.
Regardless, the area of the tile has remained unchanged. The same is true when dealing with changes in scale and the Fourier transform, though the shape and localization do change. This process follows directly from the scaling property of the Fourier transform, where scaling by a , as in f (t ) = f (at ) forces:
This equation affects the shape and time localization of the tile. The area of the tile remains unchanged in any case; to increase the time focus of a function broadens the frequency resolution and vice versa. This situation is not true of the wavelet functions we will describe in the sections to follow.
Short Time Fourier Transform
The time-frequency window of the STFT is ridged. Because frequency is directly proportional to the number of cycles per unit time, it takes a narrow time-window to analyze low frequency behavior thoroughly. Consequently, the short time Fourier transform is unsuitable for low and high frequency analysis.
Nevertheless, the STFT is widely used and is a useful tool. It is suitable for midrange signals, such as speech, which are locally stationary, but globally nonstationary.
Necessary to attaining a flexibility in time and frequency is a scaling function that is proportional to the basis function used. The resolution of a signal, at least a finite length signal, is directly related to the number of samples necessary to represent it; the fewer samples, the tighter the resolution. This process leads us to search for a basis with very fast decay, compact support. The support is where the energy is.
In the next sections, I develop the material forming the basis of wavelet theory and application. To begin with, I would like to present the formal definition of multiresolution analysis, with the ordering of axioms and space by I. Daubechies.
Containment is a sequence of closed subspaces:
that possess downward completeness :
and upward completeness :
Scale invariance is the specific difference between any ladder of subspaces and multiresolution; all the spaces are scaled versions of the central space V 0 :
Shift invariance looks like:
With existence of a basis ; there exists:
is an orthonormal basis for V 0 . More precisely:
Let us also allow Wm (the wavelet) to be the orthogonal complement of V (the scaling function) in Vm -1 . An orthogonal complement has the quality that every vector in Wm is orthogonal to every vector in V . In addition, the space Vm -1 is the direct sum of Wm and V . Restating with notation:
Finally, we introduce the projection operator P . With P , we can project a function f (x ) onto a subspace, such as V 0 . Basically, the projection operator allows us to cast the function in the resolution of the subspace. We accept V 0 as the original data sequence or vector that we wish to analyze. We understand this function according to the definitions of multiresolution analysis; it is the result of a direct sum of the spaces “below” it, which means by using the scaling function in Equation 1 , we can step through the subspaces and derive the components of that same signal at those frequency and time resolutions!
The process is not so complex, and there are many ways to develop and perform this operation. Here we will pursue the problem as a signal processing problem using orthogonal FIR filters in a subband coding configuration. First, however, let us take a simple and intuitive look at a wavelet decomposition involving a square wave.
The square wave in Figure 1 was decomposed into components related to frequency. The frequencies and their magnitudes were visible on a time scale dictated by the sample clock. The length of the scale (the time) is the length of the sampled sequence. The length of the signal being analyzed determines how many separate frequencies can be represented, just as it does with the Fourier transform. These frequencies are called levels in the wavelet transform, when the length is a power of two, that is, N = 2n there are n +1 levels. The shapes of the wavelet components depend upon the analyzing wavelet.
Click on image to enlarge.
Each wavelet level defines a particular frequency or octave, which are equally spaced intervals along the horizontal axis, each representing part of the signal occurring at the particular index of the input sequence at the particular frequency. As previously stated, the original signal, V 0 , can be reconstructed as an algebraic sum of Vj . In other words, the original square wave is returned by simple addition of each of the decomposition levels.
Each level will have 2level wavelets associated with it equally separated along the horizontal axis. At each point, only the height of the wavelet may be altered, as the shape is preordained by the wavelet chosen (I demonstrate how to create your own wavelet at the end of the article). The finest detail is given by the most negative number on the scale of containment.
Another way of viewing the wavelet is an ideal bandpass filter resulting from the differencing of an ideal lowpass filter with an input signal band limited at Nyquist frequency. This method, in fact, is the operation of subband coding, used for years in electrical and audio engineering for analysis and equalization. The wavelet coefficient, in this case, is the product of the bandpass output of the highpass filter in the analysis section of the filter. The lowpass section forms the scaling function for the network.
A dyad is two individual units that may be regarded as a pair. In terms of filter structures, dyadic cascading refers to the placement of decimation and interpolation filters behind one another to create a pair. Figure 2 illustrates the construction of a single section of an M-band subband coder. The output of the lowpass filter of each section is fed to the input of the next lower octave, creating a dyadic M-band coder, as illustrated in Figure 3 . The output of the highpass section is the wavelet coefficient. The coefficients for each filter remain the same regardless which bank we are in. Each filter bank is identical to the previous one, only downsampled by two, halving the necessary processing speed and halving the cutoffs for each filter at the same time. As we proceed down the tree, the processing rate is continually halved. The purpose of this filter bank is to separate the signal into frequency bands, then assign each of the subbands a weight depending upon the energy concentrated there. This process is known as compaction of energy.
Click on image to enlarge.
Click on image to enlarge.
A two band subband coder is constructed of an analysis filter and a synthesis filter in a critically sampled system . A critically sampled system is one in which the net output data rates are equal to that of the input signal. (Omega represents continuous time or analog frequencies.) The analysis filter band comprises a lowpass filter and a highpass filter in parallel, each with a cutoff at π/2. Figure 3 is a spectrum representing the outputs of the two filters.
Halfband filters are excellent candidates for this application because they are symmetrical around Ω with equal ripple in both passband and stopband. The frequency response of the halfband filter is real and even leading to:
which makes the even indexed samples of h (n ), except:
These zeros mean that almost one half fewer multiplications are required, while the symmetry about π /2 implies an equal ripple in passband and stopband. The passband and stopband cutoff frequencies are:
This filter uses half as many calculations as conventional FIR filters and can be operated at half the original sampling rate, which can be quite important when bandwidth and throughput are concerned. Bandwidth and throughput are often used for decimation and interpolation. These filters also form a foundation for the construction of perfect reconstruction filters and are the basis for Mth-band filters.
In operation, the input to the analysis filter is properly bandlimited to one-half the sampling frequency. The sampling rate can be halved because the sampling theorem states that the sampling rate must be greater than twice the bandwidth of the signal for perfect reconstruction. There is a M=2 decimator following both the highpass filters and the lowpass filters. The overlap in the two signals produces some small amount of aliasing, which is canceled by the synthesis filter bank.
An important aspect of these filter banks is that they must be perfectly reconstructing–the output exactly matching the input with no alias components. If properly designed, all internal alias components are canceled within the filter itself. Each cancelation results in a perfect replication of a narrow band of frequency, that when summed with all the other outputs becomes the original signal.
The z transforms of the analysis and synthesis sections of the filter bank:
H 0 (z ) = H (z ) analysis lowpass
The factor 2 in the synthesis filter cancels the downsampling in the analysis filters.
In summary, for orthogonal filter banks in general, we have the impulse responses of the synthesis filters as the basis functions themselves. The analysis filters have impulse responses that are time reversed versions of the basis functions.
Fast Wavelet Transform
Once this process has run and iterated on each of the requested lowpass outputs, we have a sequence of coefficients representing the frequency decomposition of the original input sequence, V 0 . If our purpose is analysis, the coefficients produced by each of the highpass filters is what we are after. To take it a step further, however, we could implement a simple compression scheme dependent only on thresholding. We begin with a redundant input sequence.
That is, it contains more information than necessary to recreate the desired data. Speech, for instance, has most of its energy concentrated below approximately 6kHz, but a recording may contain detail at higher frequencies that is unnecessary for the reconstruction of the speech information. If all the values that are equal or less than certain threshold values were replaced with zeros, this sequence could be reduced in length.
If the signal must be reconstructed, we can recall that the coefficients for the synthesis filters from the original lowpass prototype are as follows:
G 0 (z ) = 2H (z )
where H (-z ) is the original lowpass prototype. Each subband can then be processed and the portion of the signal produced by those coefficients will again be available. These signals are then summed to create the original, if no thresholding is done, or its facsimile if the coefficients have been otherwise processed. Some of the advantages of this scheme are:
Because we are designing filters for perfect reconstructing filter banks, we must derive coefficients for the initial halfband filters that produce a non-negative frequency response. The number of coefficients for a halfband filter are equal to N = 2i -1, and are, therefore, always odd. Because we are going to develop the coefficients for the Daubechies 4 wavelet, we will need a 7 coefficient filter. N = 2i -1 is a useful formula to remember, as its factors come into play often during the construction the wavelet that follows.
Using a Fourier method for least squared error modified for halfband coefficients: