### Multiresolution Signal Analysis and Wavelet Decomposition

*by Don Morgan*

**Wavelets provide new capabilities for analyzing real-time signals. This introductory article provides an overview and presents the basic mechanisms involved in wavelets.**

In many signal processing applications, it is only necessary to know the form and content of a signal. An example of this minimal knowledge might be the DTMF decoding in touch tone phones. You only need to know the two frequencies responsible for a particular signal to determine which button was pushed. This processing is generally done with a Goertzel or Fourier algorithm and essentially generates a snapshot of the spectrum (in the case of the Fourier transform), or selected frequencies (in the case of the Goertzel convolution). This form of analysis is used everywhere, including machine vision for pattern recognition, radio waves, and chemical analysis.

Often, a single snapshot of an event or phenomenon is not enough to analyze it completely. No single frame from a motion picture tells the whole story; neither would a group picture of the actors, locations, and props used. Any number of storylines could be inferred from such pictures.

Think of your favorite tune. Would you recognize it among others if all you were offered were lists of the component frequencies contained in the current top 10 songs? In fact, you∫d probably find that quite a number of melodies use the very same notes.

Some forms of analysis require a knowledge of the evolution of a signal to make sense. This situation is increasingly the case in many fields of communication involving speech, music, and video. As the Internet matures, people are sending video and sound directly over the phone lines, which requires the ability to analyze and compress signals along with the ability to reconstruct that signal perfectly at the other end. The speed with which we can transmit the signal is important.

Similarly, sound in the cinema must be digitized and compressed so that it can be put on the media (laser disk or actual film) along with the pictures. Again, the compression techniques must be good, so that the audio can be exactly recreated from the bit stream off the film or disk. When dealing with transient effects to recreate or analyze these signals, we need to know more about the history of the signals–that is, their evolution.

Both stationary and nonstationary signals are involved. A nonstationary process is one that evolves with time; it might be speech, music, or the vibration schema of an aircraft wing as the plane warms. A nonstationary process can be anything that changes. This kind of process requires a history–a “time when” to be analyzed. A simple Fourier transform cannot give you that history. If you took your favorite song as a sampled data sequence and performed a Fourier transform on it, you would indeed have a spectrum displaying the frequency content of the piece, but that spectrum would be of little help in analyzing the piece.

For the purposes of audio and video analysis, the signal must be decomposed as it evolves. If the signal is to be compressed, the redundancies must be removed from the stream so that it can be transmitted compactly while still providing everything necessary to reconstruct the original signal. Most people are aware of this process in terms of data compression, as is done with data on a hard disk. This process is no different in audio and video environments–the ability to reconstruct the data must be perfect.

Because analysis can involve compression, there will be some discussion of this topic. However, the emphasis of this article will be on those methods used for analysis and ultimately for reconstruction of signals such as those in the audio and video environments. Presently, Fourier and Fourier type transforms are used to determine the content of information and help reduce it for communications and compression. Wavelets offer a new approach that remedies many of the problems inherent in the Fourier transform.

The emphasis here will be on building a foundation for the understanding of wavelet and multi-resolution analysis. There is no way to present a thorough and technically complete exposition. Consequently, you∫ll find some shortcuts and some simplification. Even so, this article provides enough information and direction to make the concept of wavelets understandable.

Nonstationary systems

Signals can be broken roughly into three categories: stationary, wide sense stationary, and nonstationary. A stationary signal or function is deterministic, allowing us to predict its condition at any point in time. An example of such a signal would be a sine wave that obeys the formula:

Audio is composed of such frequencies. But if we took your favorite song, reduced it to a single data sequence, and then used the Fourier transform to produce a spectrum representing its frequency content, you would only have a list of the magnitudes of the various frequencies present in the data sequence. This information would probably be of little use in identifying the song, its melody, or much else. The transform would treat the data sequence as a snapshot. You would still not know when a certain frequency appeared, or how long it was evident. In this situation, you have no way to reconstruct the song from this spectrum. This technique would certainly fail as a compression technique for music, speech, or video in most applications. What we require is a knowledge of time.

Even though the song is composed of stationary signals, it is not completely stationary. Actually, the song could not be justifiably called nonstationary, because although for short periods it can be viewed as stationary, it would be identified as speech and video is–that is, as wide sense stationary. Signals that are truly random are called nonstationary. These are stochastic or random processes whose character is determined by an ensemble of time signals. For example, when technicians are installing and calibrating sound equipment in theaters, they set the desired acoustic response by flooding the room with 'pink' noise and reading the sound pressure magnitude. Sound technicians then increase or decrease the gain on any particular frequency band depending on what kind of sound they want for the installation. Pink noise is usually produced by a random number or bit generator and then filtered for a flat spectrum on the instrument being used. Clearly, because pink noise starts out as a random process, any instantaneous reading would not provide a proper reading. Instead, the spectrum is sampled 48 thousand times a second and a moving average is kept for each band–the band centered at 25Hz is averaged with all the other samples in the 25Hz band, and so on. These averages are called ensemble averages.

There is far more to this process, but basically nonstationary signals cannot be quantified in the same manner as stationary. Wide sense stationary signals have much in common with stationary material, but much of the information they convey changes with time, and so they cannot be handled quite so simply as stationary signals.

Because these signals occur everywhere (music, speech, vibrations, stress and so on), engineering has developed techniques to provide some information and ability to work with these signals.

Bases

As an important part of many analytic procedures and to remove redundancy from a data stream, you need a scheme that can describe the data in as few terms as possible. In mathematics, we often use a basis to map data to a another plane for this purpose.

Ideally, a basis would be a short list of descriptors with which we could describe every possible data sequence we could come up with. Again, ideally, this would be a very short list of descriptors, containing all the necessary vectors to span the plane and none that could be created by a simple combination of the others. In this way, any redundancy would be automatically removed by a transformation into the descriptor domain. A set such as this is said to be *linearly independent* .

In linear algebra, these descriptors are known as vectors, and a set of vectors through combination and scaling can describe any possible sequence we come up with in a particular plane is said to span that plane. The space defined, or *spanned* , by these vectors has a dimension *R*^{n} , where *n* is equal to the number of vectors spanning that space, or *subspace* .

Another very useful (but not necessary) quality of this set of vectors is that they are orthogonal to one another–that is, the dot product of a vector and any of the others is 0:

This process is useful because it is self-reciprocal; the same function that maps a vector into a space will map that vector to its original space. A final step that is often taken with a basis is to normalize the vectors to unit norm; this is known as an orthonormal system. An orthonormal system is always linearly independent.

The classic example of this procedure is the *R* ^{3} space with a three coordinate basis:

As you can see:

(*x* ,*y* ) = 0 (*x* ,*z* ) = 0 (*y* ,*z* ) = 0

In this article, we will deal exclusively with orthogonal systems.

Fourier Transform

An example of a mapping function that is self-reciprocal is the Fourier transform. This transform, or one of its relatives, such as the Discrete Cosine Transform, is popular for analysis and compression techniques in video and audio.

The Discrete Fourier Transform is written:

The samples are taken to be equally spaced points on the unit circle in the z-plane equal to:

Where *N* is the length of the sequence being analyzed. We may solve for an individual *X* (*k* ):

Using the DFT, we can sample a continuous time signal and develop an array of coefficients representing the relative magnitudes of individual frequencies, whose sum is the original signal.

It is not difficult to see how mapping a time domain signal to the frequency domain could result in compression, in that it is often easier to store and transmit a few discrete frequencies than the entire data sequence. Images for machine vision are sometimes stored this way with the transform also used for recognition, which may be quicker than pattern matching. Uses range from mathematical reduction to signal processing. This transform provides an extraordinary tool for analysis, as well as synthesis of stationary or wide-sense stationary signals.

As extraordinary as this transform is, however, it has some limitations. We can use the Fourier transform to find the magnitudes of the various frequency components of a data sequence, but we are limited in resolution by the number of samples. This situation means if a major component of frequency does not fall directly on a sample it will leak on both sides, becoming a hump rather than a peak, thereby skewing the spectrum. The fix is often to use more samples–the more samples the greater the frequency resolution. But each time we increase the number of samples, we reduce our resolution in time. This relationship states that defining a window of time means that we are also defining its bandwidth as 1/*t* .

A short data window means that the bandwidth of the spectral components is wide. Consequently, we have a problem getting both frequency resolution and time resolution at the same time–a manifestation of the Heisenberg uncertainty principle.

The center of gravity of a basis function such as the Fourier transform, whether it is time related or frequency related , is a point of concentration of the function. This procedure defines what is known as a tile in the time-frequency domain. It defines a spread that contains 90% of the function's energy. Time localization, or for that matter, frequency localization of a basis function is directly related to its spread in time or frequency.

As far as the Fourier transform is concerned any change in τ will only move the tile in the time domain, and, in the same manner, a change in ω , modulation, moves the tile on the frequency axis.

Regardless, the area of the tile has remained unchanged. The same is true when dealing with changes in scale and the Fourier transform, though the shape and localization do change. This process follows directly from the scaling property of the Fourier transform, where scaling by *a* , as in *f* (*t* ) = *f* (*at* ) forces:

and:

This equation affects the shape and time localization of the tile. The area of the tile remains unchanged in any case; to increase the time focus of a function broadens the frequency resolution and vice versa. This situation is not true of the wavelet functions we will describe in the sections to follow.

Short Time Fourier Transform

To alleviate some of the difficulties, the short time Fourier transform (STFT) is often used. Here, the song broken into smaller sequences of data and a Fourier transform is run on each small sequence to determine the frequency content of the song. In this way, the time-locale information is preserved. With this technique, we can increase our time resolution to almost any degree that we choose. However, as pointed out before, unless our sample rate has been driven up in proportion to this increased time resolution, we will lose frequency resolution–it is one or the other.

The time-frequency window of the STFT is ridged. Because frequency is directly proportional to the number of cycles per unit time, it takes a narrow time-window to analyze low frequency behavior thoroughly. Consequently, the short time Fourier transform is unsuitable for low and high frequency analysis.

Nevertheless, the STFT is widely used and is a useful tool. It is suitable for midrange signals, such as speech, which are locally stationary, but globally nonstationary.

Multiresolution Analysis

The problems inherent in measuring both time and frequency with the flexibility necessary to provide resolution on both counts led to multiresolution signal analysis. Fourier analysis can only partly solve the problems involved in this process.

Necessary to attaining a flexibility in time and frequency is a scaling function that is proportional to the basis function used. The resolution of a signal, at least a finite length signal, is directly related to the number of samples necessary to represent it; the fewer samples, the tighter the resolution. This process leads us to search for a basis with very fast decay, compact support. The support is where the energy is.

In the next sections, I develop the material forming the basis of wavelet theory and application. To begin with, I would like to present the formal definition of multiresolution analysis, with the ordering of axioms and space by I. Daubechies.

*Containment* is a sequence of closed subspaces:

that possess *downward completeness* :

and *upward completeness* :

*Scale invariance* is the specific difference between any ladder of subspaces and multiresolution; all the spaces are scaled versions of the central space *V* _{0} :

This difference is also known as the *scaling function* .

*Shift invariance* looks like:

With *existence of a basis* ; there exists:

such that:

is an orthonormal basis for *V* _{0} . More precisely:

Let us also allow *W*_{m} (the wavelet) to be the orthogonal complement of *V* (the scaling function) in *V*_{m} _{-1} . An orthogonal complement has the quality that every vector in *W*_{m} is orthogonal to every vector in *V* . In addition, the space *V*_{m} _{-1} is the direct sum of *W*_{m} and *V* . Restating with notation:

leading directly to the consequence that *L* _{2} (*R* ) is a direct sum of all these subspace components.

Finally, we introduce the projection operator *P* . With *P* , we can project a function *f* (*x* ) onto a subspace, such as *V* _{0} . Basically, the projection operator allows us to cast the function in the resolution of the subspace. We accept *V* _{0} as the original data sequence or vector that we wish to analyze. We understand this function according to the definitions of multiresolution analysis; it is the result of a direct sum of the spaces “below” it, which means by using the scaling function in **Equation 1** , we can step through the subspaces and derive the components of that same signal at those frequency and time resolutions!

The process is not so complex, and there are many ways to develop and perform this operation. Here we will pursue the problem as a signal processing problem using orthogonal FIR filters in a subband coding configuration. First, however, let us take a simple and intuitive look at a wavelet decomposition involving a square wave.

The square wave in **Figure 1** was decomposed into components related to frequency. The frequencies and their magnitudes were visible on a time scale dictated by the sample clock. The length of the scale (the time) is the length of the sampled sequence. The length of the signal being analyzed determines how many separate frequencies can be represented, just as it does with the Fourier transform. These frequencies are called levels in the wavelet transform, when the length is a power of two, that is, *N* = 2^{n } there are *n* +1 levels. The shapes of the wavelet components depend upon the analyzing wavelet.

Each wavelet level defines a particular frequency or octave, which are equally spaced intervals along the horizontal axis, each representing part of the signal occurring at the particular index of the input sequence at the particular frequency. As previously stated, the original signal, *V* _{0} , can be reconstructed as an algebraic sum of *V*_{j} . In other words, the original square wave is returned by simple addition of each of the decomposition levels.

Another way of viewing the wavelet is an ideal bandpass filter resulting from the differencing of an ideal lowpass filter with an input signal band limited at Nyquist frequency. This method, in fact, is the operation of subband coding, used for years in electrical and audio engineering for analysis and equalization. The wavelet coefficient, in this case, is the product of the bandpass output of the highpass filter in the analysis section of the filter. The lowpass section forms the scaling function for the network.

A dyad is two individual units that may be regarded as a pair. In terms of filter structures, dyadic cascading refers to the placement of decimation and interpolation filters behind one another to create a pair. **Figure 2** illustrates the construction of a single section of an M-band subband coder. The output of the lowpass filter of each section is fed to the input of the next lower octave, creating a dyadic M-band coder, as illustrated in **Figure 3** . The output of the highpass section is the wavelet coefficient. The coefficients for each filter remain the same regardless which bank we are in. Each filter bank is identical to the previous one, only downsampled by two, halving the necessary processing speed and halving the cutoffs for each filter at the same time. As we proceed down the tree, the processing rate is continually halved. The purpose of this filter bank is to separate the signal into frequency bands, then assign each of the subbands a weight depending upon the energy concentrated there. This process is known as compaction of energy.

Halfband filters are excellent candidates for this application because they are symmetrical around Ω with equal ripple in both passband and stopband. The frequency response of the halfband filter is real and even leading to:

These zeros mean that almost one half fewer multiplications are required, while the symmetry about π /2 implies an equal ripple in passband and stopband. The passband and stopband cutoff frequencies are:

This filter uses half as many calculations as conventional FIR filters and can be operated at half the original sampling rate, which can be quite important when bandwidth and throughput are concerned. Bandwidth and throughput are often used for decimation and interpolation. These filters also form a foundation for the construction of perfect reconstruction filters and are the basis for Mth-band filters.

In operation, the input to the analysis filter is properly bandlimited to one-half the sampling frequency. The sampling rate can be halved because the sampling theorem states that the sampling rate must be greater than twice the *bandwidth* of the signal for perfect reconstruction. There is a M=2 decimator following both the highpass filters and the lowpass filters. The overlap in the two signals produces some small amount of aliasing, which is canceled by the synthesis filter bank.

An important aspect of these filter banks is that they must be perfectly reconstructing–the output exactly matching the input with no alias components. If properly designed, all internal alias components are canceled within the filter itself. Each cancelation results in a perfect replication of a narrow band of frequency, that when summed with all the other outputs becomes the original signal.

The factor 2 in the synthesis filter cancels the downsampling in the analysis filters.

In summary, for orthogonal filter banks in general, we have the impulse responses of the synthesis filters as the basis functions themselves. The analysis filters have impulse responses that are time reversed versions of the basis functions.

Once this process has run and iterated on each of the requested lowpass outputs, we have a sequence of coefficients representing the frequency decomposition of the original input sequence, *V* _{0} . If our purpose is analysis, the coefficients produced by each of the highpass filters is what we are after. To take it a step further, however, we could implement a simple compression scheme dependent only on thresholding. We begin with a redundant input sequence.

That is, it contains more information than necessary to recreate the desired data. Speech, for instance, has most of its energy concentrated below approximately 6kHz, but a recording may contain detail at higher frequencies that is unnecessary for the reconstruction of the speech information. If all the values that are equal or less than certain threshold values were replaced with zeros, this sequence could be reduced in length.

If the signal must be reconstructed, we can recall that the coefficients for the synthesis filters from the original lowpass prototype are as follows: