The demand for DSPs has outstripped the supply of engineers and programmers who are familiar with them. Price, performance, and power are the first three criteria most developers evaluate. This high-level article walks you through the process of weighing and balancing these and other characteristics.
Digital signal processors (DSPs) have become the foundation of many new markets for technology in recent years, and the innovations have only begun. New forms of motor and motion control, automotive systems, entertainment systems, and a vast range of communications are all areas that have been built on the high computational performance of DSPs. Developers in these and other areas increasingly turn to DSPs for real-time signal processing, along with support rivaling that for traditional RISC microcontrollers. As a result, in a short time DSPs have come from being considered esoteric toys played with only by lab scientists to standard tools in developers' workshops.
Availability and familiarity are not everything, though. As developers increasingly turn to DSPs as a solution to their real-time system needs, they're understandably concerned about the value of what they select. The calculus of tradeoffs involved in selecting a processor is complex, and the information provided about the device doesn't always tell how it will perform in a specific application. From the developer's perspective, selecting a DSP usually comes down to evaluating the three Psraw performance, operational power consumption, and the price of the system overall. Without full information, how does the developer decide when to use a DSP and which one to use?
The continuum of metrics
Plenty of information is available, of course. DSP vendors all supply lots of metrics: megahertz, MIPS, megaMACs, milliwatts, MIPS per mW, MIPS per dollar, etc. The problem is that these numbers only provide the sketchiest indication about performance, power, and price in the end application. Even benchmarking standards such as the one from BDTI (Berkeley Design Technology, Inc.) are only guidelines that exercise the kernel operations of the device in common ways. They can't predict exactly how the device will measure up in the target system, where the DSP performs a set of operations unique to the application. For developers who are trying to evaluate a DSP versus a RISC or FPGA, the picture is complicated by the different mix of benchmarks among these types of devices. RISC metrics largely ignore the multiply-accumulate (MAC) operations essential to real-time signal processing, while with FPGAs it's gate counts that are truly significant in overall cost.
Figure 1: Spectrum of DSP metrics
Figure 1 shows the spectrum of metrics that tell the story of DSP performance. Starting on the left are common specifications that tell how fast the device is operating (megahertz) and how many instructions it handles (MIPS). These measurements, which apply to any processor, are followed by generic DSP specifications of millions of multiply/accumulate operations (MMACs) and billions of floating-point operations per second (gigaFLOPS) that the device handles. Following the continuum to the right are the general operational benchmarks such as BDTI's, then measurements of how the device handles specific algorithms that will be used for the equipment. (Since the latter are often necessary for evaluating competing algorithms as much as DSPs, the method of measurement can vary along with the device under test.) Finally, the developer creates the benchmarks and uses them to measure the application and end equipment. A similar though less involved spectrum could be shown for power, ranging from the generic mW/MHz to an application-specific measure such as channels per watt.
The more technology-general specifications to the left end of the spectrum are easier for vendors to define and measure than the more application-specific benchmarks to the right. The generic metrics are also easier to communicate and, because they're applied across a broad range of products, they tend to be used as the initial basis of product comparison. For the developer, though, generic metrics are the least meaningful; and since they're so readily available and so heavily touted, generic metrics frequently become a source of frustration because they get in the way of the search for application-specific information.
The gap between the developer's need for application-specific information and the vendor's ability to supply it is not just a matter of misunderstanding. In many cases, especially with new types of applications, only the developers themselves can create the benchmarks they need to determine the ultimate value of a given DSP in the system. The more that's known about the application base, the easier it becomes for DSP vendors to embrace these metrics and publicize them. Recognizing the disparity between the information they can readily provide and what developers really need to know, DSP vendors are continually seeking ways to find meaningful metrics.
What about the applications themselves? How do their requirements differ in terms of the three Ps? As might be expected, value takes on a variety of meanings, depending on the needs of the system. For example, the designer of a handheld communications device is extremely concerned about its power efficiency and will look for a DSP that's designed to be power efficient, provided that it offers sufficient performance for the end application. On the other hand, the designer of the communications infrastructure equipment that complements the handheld device is less concerned about the power efficiency of a DSP than its raw execution performancethat is, given that the power dissipation is acceptable. Price is always a factor as well, though normally it's more critical with the smaller, mass market systems.
Table 1: Requirements for the three Ps in different DSP market segments
|Market Segment||Performance||Price||Power Dissipation|
|Portable video recorder|
|Cell phone handset|
|Cellular base station|
|Video head end server|
Table 1 lists a number of DSP applications, along with the relative importance (rated 1 to 3) for performance, price, and power dissipation. When performance is the main criterion of DSP selection, the applications tend to be larger, use grid power, and are likely to support multiple channels, or at least multiple tasks, simultaneously. When price is the top priority, the equipment may or may not use grid power, but they're invariably high-volume consumer items. Finally, when power dissipation is the most important factor, the end equipment is generally personal and portable.
Difficulties of benchmarking applications
Two well-known standards used in audio (Dolby Digital) and video (H.264/MPEG-4 AVC) show how difficult it is to create a simple benchmark on application performance. Since the algorithms are well understood, it would seem that they should both provide a straightforward basis of comparison for the performance of various signal processing devices. Unfortunately, that's not necessarily the case.
First consider the options available in Dolby Digital. At present, the standard supports speaker configurations of either 2.0 (two front speaker channels for traditional stereo) or 5.1 (three front, two back, and one subwoofer for the original home theater surround sound), and there's only one sample rate choice of 48kHz. So comparison among DSPs should be simple enough.
But how do the options of Dolby Digital today translate into future needs? Audio is quickly becoming more complex. More channels are being added for side and back speakers, so that there are 2.1, 3.0, 3.1, 4.0, 6.0, 7.1, 9.1, and 10.2 configurations, in addition to the 2.0 and 5.1 options. In addition, the sample rate choices have multiplied to include 32, 44.1, 64, 88.2, 96, 128, 176.4, and 192kHz, as well as 48kHz. With all these choices, which configurations and which sample rates will be representative of the market in the future? Which ones should DSP vendors benchmark?
Even if all the possible combinations of these variables could be taken into account, the question remains whether Dolby Digital is representative of other audio coders like DTS and AAC, which have their own array of options. The simple conclusion is that even a well-established industry standard like Dolby Digital does not always give a good indication of how a DSP will perform in an end application.
Audio is relatively simple when compared to video and imaging. Video has many standards. They include MPEG-1, MPEG-2, MPEG-4, and the newest, H.264. This newest standard was jointly endorsed in 2003 by the ITU and ISO. With its breadth of flexibility and capability, H.264 will make it both easier for the industry and, at the same time, even more confusing.
But rather than focus on this latest standard, let us focus on the previous standard, MPEG-4 to reduce the complexity of our discussion. Let's start by a simple overview of four different applications for video.
The applications are:
- DVD player
- DVD recorder
- Video phone
- Security camera
The DVD player is simply a playback machine for video and, in some cases, music. In this application the decoder must be able to handle multiple scene changes in the media, with excellent picture quality, and with multiple audio formats, all at a low data rate.
The DVD recorder must be able to handle the same requirements as the DVD player with the addition of being able to encode a video and audio stream in addition to decoding them. It might also need, at some point in the future, to transcode. The transcode capability would require the product to do simultaneous encode and decode.
The video phone differs significantly from the DVD recorder/player in several aspects. One significant way is in its need to minimize latency. Another is its lesser demand on video and audio quality. Rather than a D1 or high-definition (HD) video quality, it can use a format such as CIF or QCIF. Finally, a video phone doesn't have to handle the multiple scene changes that a movie requires.
Finally, the security system has two aspects. The first is the camera, which is an encode-only device. The second is the infrastructure that may have multiple cameras connected to it. In both cases the video quality is less stringent than for movies in several ways: The image size, few or no scene changes, and lower frame rates. The system might even use JPEG rather than MPEG-4/H.264. Table 2 summarizes the differences in these four applications.
Table 2: A summary of four applications of video compression technology
|Application||Encode||Decode||Audio in||Audio out|
|Video phone||CIF||CIF||Voice band||Voice band|
|Security||CIF/JPEG||None||Voice band||Voice band|
As we can see from the four examples, an important difference in videoconferencing and entertainment videos from movies or television lies in the frequent scene changes of the latter group. High compression for low bit-rate transmission may produce an acceptable image for videoconferencing, but a lower compression ratio with higher transmission bandwidth is usually necessary for entertainment video. Clearly, a standard that covers both videoconferencing and entertainment video needs to offer this flexibility.
The H.264/MPEG-4 AVC standard provides this flexibility with support for three profiles: baseline, main and extended. The baseline profile requires the least computation and system memory and is optimized for low latency. It doesn't include B (backwardly predicted) frames due to the inherent latency, or CABAC due to computational complexity. The baseline profile is a good match for video telephony applications as well as other applications that require cost-effective real-time encoding. The main profile provides the highest compression but requires significantly more processing than the baseline profile, making it difficult for low-cost real-time encoding and also low-latency applications. Broadcast and content storage applications are primarily interested in the main profile to leverage the highest possible video quality at the lowest bit rate. The extended profile, with support for additional features such as graphic elements, has so far generated less interest in the industry.
Once again, as shown in the examples, there are other factors to consider about the application, too. Will the image be full-screen D1, quarter-screen CIF, one-sixteenth screen QCIF, or something else? And will these resolutions correspond to NTSC, PAL, or another standard? Is the image high-definition? Are there 25 or 30 frames per second? Will the scanning be interleaved or progressive? Is a pixel defined by eight bits per color or more? Will the system encode only, decode only, or encode and decode? Will it include audio processing? If so, which audio standards and options apply?
Table 3: Frame rates for network bit rates and compression ratio combinations
|Frame Rates||Image Compression
|Advanced Video Compression
|Compressed 10:1||Compressed 30:1||Compressed 60:1|
|GSM digital cellular||14|
|56K modem (PSTN)||56|
|DSL or cable up-link||128|
|Wireless LAN (802.11)||11,000|
Table 3 puts these issues in a larger context by showing the full-frame throughput for different compression standards on different networks. These maximum theoretical frame rates for transmitting generic VHS-quality digital video data (352 x 240 pixels) are based on MPEG-4 and H.264. Successive JPEG frames, often called Motion JPEG and frequently used for security networks, are also shown for comparison. Ultimately, the degree of compression required for an application will depend not only on the transmission bandwidth and computational performance available, but also on the quality of image desired. Since any given video platform may be used in more than one application, DSP performance metrics for several of these potential end uses of a system may be valuable to the developer.
Obviously, system developers would like to have as much information as possible about how DSPs and other processing elements can perform H.264/MPEG-4 AVC compression and decompression, but the success of the new standard in improving performance is also the source of difficulty in providing benchmarks for it.
Table 4: Percentage of DM64x DSP cycles required at 600MHz
|Standalone Video Codecs||DM642Encode||DM642Decode|
|JPEG||22% (D1)||22% (D1)|
|MPEG-4, Simple Profile||50% (D1)||12% (D1)|
|MPEG-2, Main Profile at Main Level||85% (D1)||25% (D1)|
|Windows Media Video 9, Main Profile||90% (D1)||40% (D1)|
|H.264, Main Profile||Multi-chip||83% (D1 up to 4 Mbps)|
|H.264, Baseline (for videophone)||70% (VGA)||30% (VGA)|
|For 4:2:0 video, 30 fps, D1 (720×480), VGA (640×480).|
Table 4 provides some of these metrics for a widely used DSP, the Texas Instruments TMS320DM64x, operating at 600MHz. The percentage of the processor cycles used for performing H.264 and MPEG-4 is shown, along with JPEG, MPEG-2, and Windows Media Video 9 for comparison. Note that these benchmarks are based on typical test data for existing implementations or detailed performance estimates. Encoder implementations can also vary significantly depending on the feature set invoked. In other words, this type of data, while well researched, is only a guideline for the developer and not definitive for the end application.
In the last few years, DSP design has come a long way toward meeting the needs of different application areas. Since the mid-90s, DSP vendors have developed specialized architectures designed for different mixes of the three Ps. Many of the metrics for these architectures, though still not application-specific, have become inherently closer to what developers want to see projected about the DSPs in end use.
Some architectures, designed for handheld applications such as cell phones and PDAs, are focused on keeping power consumption extremely low while performance and price stay reasonable. Others are based on a very-long-instruction-word (VLIW) data path to achieve massive parallelism, enabling extremely high performance while power consumption per channel and price per MMAC are reasonable. These VLIW DSPs benefit multi-channel systems such as wireless base stations, video servers, routers, DSL and other telecom concentration units, and so forth. Still others focus on system price while offering reasonable performance and power by integrating the specific memory configurations and sets of peripherals needed for motors, uninterruptible power supplies and other embedded control applications. Commonly known DSPs that serve as examples of these three respective architectures are the TMS320C55x, C64x, and C28x DSP families from Texas Instruments.
Having some background about the intent of the architecture can be useful, since performance metrics are seldom as specifically targeted as developers would like. The choice of signal-processing engine must also include factors such as whether the DSP vendor has expertise in the application area, what kind of support is offered, availability, and so forth. DSP vendors work hard to make the technical information available and as relevant to developers' needs as possible. But in an industry that changes so quickly, as soon as performance metrics can be nailed down, they are often out of date. In the end, no matter how much technical data is available, choosing a solution depends to some degree on the developer's subjective judgment concerning how the device balances the three Ps of value.
Gene Frantz is DSP business development manager at Texas Instruments. He joined TI's consumer products division in 1974, where he worked on TI's educational product such as the Speak & Spell and led the development team for all of the early speech products. Frantz has a BSEE from the University of Central Florida, an MSEE from Southern Methodist University, and an MBA from Texas Tech University. He is a Fellow of the Institution of Electric and Electronics Engineers and holds 30 patents in the area of memories, speech, consumer products, and DSP. You can reach him at .
Leon Adams is a DSP strategist at Texas Instruments responsible for overseeing TI's DSP product positioning. Prior to working at TI, he served as a microprocessor systems engineer, IBM Token Ring NIC program manager, and C5000 DSP product manager. Adams has a BS in engineering physics from Murray State University and an MBA from the University of Texas at Austin. He's active in computer and communications industry standards organizations and can be reached at .