
An Introduction to Digital Signal Processors
by JACK G. GANSSLE
DSP is surfacing in an increasing number of applications. For those of you who have been dog paddling through recent DSP articles, here's a basic
orientation to keep you in the swim.
Perhaps one of the last bastions of old-fashioned embedded design is one of the newest, hottest fields: digital signal processors. DSP developers still tune their code for every last bit of performance, squeezing astonishing algorithms into sometimes tiny memory spaces.
Though DSPs have been around since the early '80s, their currently skyrocketing performance, decreasing price, and reasonable power consumption make them the workhorse of choice for many
computational-intensive applications.
But the story of DSP is the story of signal processing. A DSP is not your typical controller. These parts have architectures highly optimized for specific types of operations.
Signal Processing 101
Unhappily for those of us who slid through math classes in college, and who managed to forget anything beyond basic algebra in the years hence, the entire raison d'ýtre for DSPs revolves around applying nontrivial math to complex analog signals. Sometimes they're
also used as accelerators for other fast, complex mathematical transformations, like graphics algorithms.
DSP architecture simply makes no sense unless you have a basic understanding of the sorts of math used to process signals.
A mathematical function of any sort, from a simple multiplication to a filter or fast Fourier transform (FFT), is an operation that transforms one or more input signals to an output. We can think of a function for computing distance as a closed box-an algorithm-that accepts
time and speed as inputs, producing distance as output.
Analog signals, such as those used in virtually all of the electronic devices we interact with daily, are voltages that vary with time. The 110VAC (in the U.S.) at your wall outlet is a smoothly varying voltage that fluctuates from minimum to maximum 60 times per second. The output of your CD player is a similar voltage, varying much more complexly with time to accurately convert a digital bit stream to an analog speaker motion that represents
Beethoven's Fifth.
Signal processing algorithms accept analog signals and feed them through a function to convert them to something else. A filter is one example of such a function.
Filters are conceptually simple. A high pass filter accepts an analog input and blocks all of the low frequency components. Low pass filters do the opposite. Notch filters block or pass only one narrow frequency range.
Remember the bad old days of phonograph records? The tiniest imperfection created an annoying pop in
the speakers. A filter can reduce the amplitude of the pop (rendering it less audible) by simply averaging three or four sequential samples.
That is, if the phonograph produced a signal voltage that varies with time, called f(t), then you could transform it by passing it through a filtering algorithm of the form:
o(t) = [f(t-1) + f(t) + f(t+1)]/3
which is nothing more than a simple moving average. Three adjacent points in time are summed, then divided by the number of points to form a true
average.
This algorithm is really a simple form of the convolution, a crucially important signal processing concept. Convolutions, which feed one function through another, are notationally represented by the asterisk. In the example above, we convolve the input signal f(t) with an array of constants (call them c(t)), as follows:
o(t) = f(t) * c(t)
In this example, the constant array is nothing more than 1/3, 1/3, and 1/3. That is, for every point of the input signal, the convolution f(t) * c(t)
multiplies three adjacent points by 1/3, and sums the three points.
Though the trivial function we've described does reduce the size of the pop, it does so by smearing the pop over three adjacent outputs. In fact, this moving average has the effect of smearing all of the music just a bit-not something a purist would tolerate.
A different sort of filter, using the same math but a different convolving function, could smear the music less but still reduce the amplitude of the pop. Instead of a three-point
moving average, use five points, selecting a different set of convolving coefficients:
o(t) = [2f(t-2) + 3f(t-1) + 5f(t) + 3f(t+1) + 2f(t+2)]/15
See how the center point is emphasized while outlying points have much less influence on the transformed signal? We've changed our simple moving average to an average that has "shape" associated with it (defined by the convolving coefficients 2, 3, 5, 3, 2). Less signal smearing occurs, yet the amplitude of the pop still decreases.
A more complex notch
filter could, if you knew the spectral characteristics of the pop, completely eliminate the sound of the pop by blocking anything with those defined parameters. This filter would be known as a finite impulse response (FIR) filter. The output is a function of the inputs convolved with a set of coefficients.
Another common filter is known as the infinite impulse response (IIR) filter. Like the FIR, it's a convolution of coefficients and inputs. Unlike the FIR, the IIR equation also includes some number of
outputs in the result-a form of feedback.
FIRs and IIRs are the basis for a vast number of signal processing algorithms implemented by DSPs. Making a filter using analog components like op amps, resistors, and capacitors is rather easy. In fact, many electronic products, from CD players to radios, use filters heavily. Analog designs suffer from drift and instability, though, and are increasingly difficult to build as your filter requirements become more precise. A digital implementation allows you to
precisely specify your filtering requirements by fiddling with the coefficients.
Even better, you can change the characteristics of a digital filter on the fly by modifying the convolving coefficients. These adaptive filters can more precisely match the noise spectrum in a particular phone line, for example, or adjust themselves to a person's speech characteristics in a voice recognition circuit. No analog filter implementation is so dynamic.
For non-analog types, getting excited about filtering is
difficult; somehow, relating the math to our daily experience is a tough thing to do. One simple application might help. Consider music synthesizers: they use electronics to produce sounds imitating those created by conventional musical instruments. A particular type of IIR filter produces sounds almost exactly the same as those made by plucked strings. Strange as it may seem, if you feed white noise-a Gaussian distribution of sound-into the filter, you can simulate all sorts of stringed instruments just by
modifying the convolving coefficients. You can change from violin to cello by passing a new set of constants through the algorithm.
Filters aren't the only sort of math performed by DSPs. In 1822 Jean Fourier found that all periodic waveforms can be expressed as the sum of sine waves of different frequencies and amplitudes. (Sometimes a lot of sine waves. For example, a perfect square wave includes an infinite number of sine components.)
Various Fourier transforms change functions of time into
functions of frequency, and vice versa. In other words, a Fourier transform of a voltage rapidly digitized by an A/D can give us a map of every frequency component in that signal. For decades engineers have used spectrum analyzers to do this transformation in a development lab environment. Seeing the complex waveforms of, say, the cellular radio band, displayed as a graph of vertical lines showing the amplitude on one axis, and their frequency on the horizontal axis is quite dramatic. A DSP running a Fourier
transform does much the same.
The vector dot product (the sum of the point-wise multiplication of two vectors) is important to many applications like correlation, matrix math, and multidimensional signal processing. Graphics algorithms use vector addition. Other convolution techniques are the basis for error correction in transmitted signals.
Convolutions, filters, Fourier transforms, vector products, and other common signal processing tasks all share a common implementation detail: the algorithms all
compute vast numbers of sums of products (a = bc + d). This computation is the basis for DSPs.
Figure 1 - An FIR filter combines an input, x(n) with coefficients - a(0), a(1), and a(2) - and with previous versions of the input (denoted by the Z boxes, each of which indicates one unit of delay).
DSP Architectures
Most DSP applications employ a similar design. An analog front end feeds data to an A/D converter that digitizes data in the time domain. The DSP
reads the samples and computes lots and lots of equations of the form a = bc+d at high speeds. The output goes to a D/A and thence back to the real world.
Considering again the example of eliminating the phonograph pop, an A/D would sample the musical stream tens of thousands of times per second (44.1KHz, or 23ýs per sample, for CD-quality sound). The DSP would compute the simple filter described, or more likely one with many more convolution terms in real time, generating a digital representation of the
smoothed music, which a D/A then converts back to analog. Given realistic numbers of convolution coefficients, even in this simple system multiplies and adds are taking place at sub-microsecond rates.
The first characteristic of a DSP is its ability to solve equations of the form a = bx+b quickly. These high-speed math operations are moving a lot of data, and perhaps instructions, around on system buses. The array of coefficients may be large; certainly the data stream is, though it may not have to
have much transient on-board storage. The speeds may simply be too high for reasonably priced memory arrays.
The second characteristic of a DSP is its memory configuration. Most DSPs include some amount of on-board memory that runs at full system speed.
In our mobile world, half our users want to take the pop-reduction equipment on the road, perhaps operating off a couple of AA batteries for weeks on end. Presumably, someone cranks the phonograph manually. . . .
The third characteristic of a DSP
is its power consumption. Although we live in a battery-operated world, today's batteries have quite limited Amp-hour capacities.
DSPs use a number of architectural approaches to compute sums of products at staggering rates. The first is Harvard or Harvard-like bus structures, where the instruction and data spaces are separated. This separation lets the CPU fetch instructions and data simultaneously. Unfortunately, a baffling number of variants exist, some even including cross-over switches to flip
buses onto each other. Making a deterministic comparison between bus architectures is virtually impossible. Each has its own proponent, and each comes with its own strengths and weaknesses. Each new family of DSPs seems to have its own approach to moving data around quickly.
Some of this memory is invariably onboard for speed reasons. Modern DSP speeds range upwards of 200MHz-a 5ns bus cycle, far too fast for most memory systems. External RAM, if used, generally runs with wait states to control system
costs. This means the very fastest portions of the code should be relatively small, tight loops that can live onboard the chip.
You'll frequently find DSP chips that run multiple processes in parallel, performing several multiply/accumulates at the same time. TI's latest entry claims a sustained rate of eight operations per clock, for a staggering 1.6 billion instruction rate at 200MHz.
Figure 2 - An IIR filter employs a sort of feedback mechanism which combines some
amount of every input into the output; that is, the output y(n) is a function of x(n), where n ranges from the first sample processed to the current sample.
The core of the DSP (and a critical difference between DSPs and conventional microprocessors) is the multiply/accumulate unit (MAC), which is the brains behind solving a = bc+d repeatedly and quickly. Words cannot do justice to the MAC. Instead, look at this code snippet for the ZSP16401 from ZSP Corp.:
loop:
lddu r8, r13, 2 ; load x[i],
x[i+1]
lddu r6, r14, 2 ; load y[i], y[i+1]
mac2.a r8, r6 ; accumulate x[i]*y[i]
+ x[i+1]* y[i+1]
agn0 loop ; loop
After the first pass through the loop, the CPU executes this operation in a single clock cycle! At 200MHz, that's 5ns per computation. Prefetchers and cache logic read ahead to keep the DSP fed with the x and y data arrays.
A big problem with fast DSPs is simply keeping the beast fed with data. Some specs may show mindboggling numbers
of MIPS. A one-cycle multiply/accumulate is wonderful-if the system design can keep data pouring into the unit fast enough. Make sure that these numbers indicate sustained ratings, not a quick burst or two.
The quest for speed does carry a price: power. When the Pentium first came out, it consumed 14 watts, putting out enough heat to almost be dangerous, while sucking down battery supplies at alarming rates. DSPs also have a speed/power tradeoff, though their use in so many portable applications means
that most include power-saving modes, and most offer quite reasonable power consumption. For example, TI's TMS320C40 consumes only about 1mA per MIPS at 3V.
As the cost per transistor continues to plummet, more vendors are adding floating-point math units to their DSPs. Until recently, all DSPs were integer units, processing typically 16, 24, or 32 bits of integer info at a time.
Integers, though, are less than useful in many applications. Most non-floating point DSP applications use "fixed-point"
math, where a 16-bit accumulator has an implied binary point. Typically the most significant bit (MSB) is a sign bit; the binary point is positioned to the right of the sign. This means the entire accumulator represents a fraction that ranges from zero to .111111111111111, or around .99. (The first bit to the right of the binary point represents 1/2, the second represents 1/4, and so on.)
Fixed point is handy because multiplies can never cause an overflow. Additions can, though, so most processors
include a mechanism for detecting overflow. Realize, however, that fixed point is purely a programming convention; any arithmetic/logic unit (ALU) that can do two's complement math does fixed point automatically, as the math works out the same. This feature requires a certain amount of manual scaling because A/Ds generally present integer data.
The precision represented by this notation is limited to the 15 bits after the binary point. The dynamic range, unless extra user-written code extends it, is equally
limited. In some applications, inputs and product terms may span vast dynamic ranges. If you select a DSP with on-board floating point you can generally ignore the range issue, as the usual notation gives 24 bits of precision and a range of about plus or minus 2127.
DSP Development
The pages of this magazine display many tools for writing and debugging DSP code. A few factors, though, make for some unique challenges with DSP technology.
The assembly vs. C debate was settled a long time
ago for most microprocessor applications. Eight-bit systems are now routinely coded in C, even for C-hostile chips like the 8051 and PIC. This isn't the case for DSPs.
The quest for raw horsepower-at least in the fastest portions of the firmware-means that developers almost always write DSP code in assembly language. Indeed, many applications use assembly from top to bottom, though this is starting to change.
Speed isn't the only problem with high-level languages. C doesn't have a fixed-point
standard, a compelling argument in favor of the more expensive floating-point chips.
DSPs naturally lend themselves to a well thought-out mix of C and assembly. It'll be a long time before they are so fast that machine cycles can be tossed away for the sake of using a high-level language, as we regularly do to reduce development time with conventional microprocessors.
While DSP compilers have lagged behind those of CISC, their debugging tools eerily foreshadowed the direction that microprocessor debuggers
are only now headed. Because the speeds are so great, and because so much of a DSP application lives in on-chip memory, conventional emulators that monitor the bus are rare.
Instead, most DSP chips include a JTAG interface optimized for debugging. A few pins are dedicated to serial in/out streams driven by a relatively inexpensive control unit. Single stepping, memory, I/O access, and the like all take place over this serial interface. A high-level debugger with GUI that understands the file formats
produced by assemblers and compilers gives the developer a window into the system. Though this generally precludes overlay RAM and real-time trace, it's a realistic compromise between debugging needs and physical realities.
Motorola pioneered this philosophy with CISC chips using their BDM interface, now a standard fixture on most of their parts. Other vendors have found it to be a cost-effective way to get some level of debugging power to developers.
A tough part of DSP design is selecting algorithms
and coefficients for the particular filters for your application. Lots of packages, from freebies on the 'Net to CDs included with DSP books to some commercial packages, are available. Filter design is an art in itself, requiring a fair amount of study to ensure the resulting designs are stable and efficient.
The Future
The DSP has made available entirely new types of technology. Digital cellular products wouldn't exist without the DSP, as they do error correction, equalization, and
compression/decompression of speech in real time-in a noisy, impossible RF environment.
A significant shift is taking place, from custom signal-processing ASICs to commercial DSP chips. Even if you ignore all of the benefits of using an off-the-shelf part, the DSP, like all computers, is a programmable device. In a world full of change, ASICs, whose logic is cast in metaphorical concrete, are a significant liability. Some applications-like ADSL and ISP modems-are on the market today as the communications
standards change at a frightening rate. Downloading new code sure beats replacing a board.
Applications like sonar/radar processing, motor speed control, image enhancement and compression, speech processing, and a thousand others are reasons why DSP sales continue to grow at a rate 50% faster than the overall semiconductor growth rate. While the big players in the market are TI, Lucent, Analog Devices, and Motorola (accounting for over 90% of all DSP chips sold), many other companies offer intriguing parts
that balance speed, power, and memory organization.
Signal processing is coming of age. Though DSPs will never replace conventional microprocessors, we'll see them used wherever small but fast chunks of looping code massage massive quantities of data.
Jack Ganssle is the consulting technical editor and a columnist for Embedded Systems Programming. He may be reached at jack@ganssle.com.
|