HOUSTON, Texas–Enabling developers to run complex DSP simulations in minutes instead of hours, Texas Instruments Inc. introduced the fastest DSP simulation technology available in the market. With increased simulation speeds of up to 21 times faster than previous offerings, developers can simulate data in only minutes compared to hours using competing DSP simulators. Along with TI's new simulation technology, TI is announcing the Analysis Tool Kit for TI's TMS320C5000 and TMS320C6000 DSP platforms that enables developers increased application visibility and the opportunity to profile and model full DSP applications. To see competitive benchmarks or to get more information, visit www.ti.com/fastsimulation.
Developers need DSP simulators because they allow the architecture of complex multicore systems to be realized early in the design process, typically months before hardware is available. Simulators make it possible to evaluate various design configurations without the need for prototype devices”however, the slowness of previous simulators prevented them from being used extensively. Also, data-collection tools of the past did not give developers the visibility they needed to identify problems or bugs within their code. Studies have demonstrated that bugs and bottlenecks found late in the design cycle are harder and more expensive to isolate and fix, and can be the reason for a product missing a critical market window. Now with fast simulators and with TI's new Analysis Tool Kit, developers can use DSP simulators earlier in the design cycle to achieve the following benefits:
full system-level modeling, making the transition to hardware much easier;
code optimization early on to achieve maximum system performance; and
deeper visibility both in software and hardware to catch and fix costly bugs in the early stages of the development cycle, eliminating wasted weeks or months.
“The potential benefits of DSP simulation have been known for a long time, but because simulators have been slow and difficult to use, most developers never adopted them,” said Paul Zorfass, senior analyst with IDC/FTI. “TI's new simulation tools make DSP simulation more usable and practical, not just because of simulation speed, but because they accumulate and display more critical data and present it in a very usable manner. It is very likely that these tools have the potential to significantly improve the design process.”
As a key component of TI's Code Composer Studio (CCStudio) Integrated Development Environment, the Analysis Tool Kit makes it possible to exploit the benefits of simulation in every phase of the design process and gain unprecedented levels of visibility before and after hardware availability. TI is the first DSP provider to combine these analysis tools into one kit, eliminating the need for the developers to shop multiple vendors. The Analysis Tool Kit includes four new powerful components: an on-chip cache memory conflict analyzer, a pipeline stall analyzer, a code-coverage analyzer, and a multi-event function profiler. Each one is designed to help develop and optimize applications by providing extensive visibility into on-chip memory, application behavior, and use of system resources.
- Cache Analyzer “provides a graphical depiction of cache accesses, highlighting cache hit and miss patterns over time. While it has been possible to collect cache-miss data, there has never been a way to quickly identify the root cause of the cache misses. The cache analysis tool within the Analysis Tool Kit gathers this cache-miss data automatically, and clearly identifies its source so that the developer can isolate and identify patterns to better organize the code to optimize performance.
- Pipeline Stall Analyzer “rapidly identifies stalls in the pipeline down to the instruction level, and presents the information on a single screen that the designer can use to reorder the instruction to eliminate the conflict. This tool makes this process so simple that even designers without extensive DSP development expertise can optimize a DSP's pipeline structure to maximize performance.
- Code Coverage Analyzer “automatically finds the conditional statements in the code, tracks the path taken through it, and provides results in an easy-to-read graphical format in a single display. The analyzer leaves no code untested, and makes it possible to have confidence in code coverage based on actual data taken by a highly accurate and repeatable process for the first time.
- Multi-Event Function Profiler “allows the user to collect data on multiple events and presents it in a single table, saving the developer valuable time from having to analyze each event separately. They can also see how changes made to one event affects all other events, allowing them to determine how steps taken to improve performance in one operational area impact others as well.
Pricing and Availability
Fast simulators and the Analysis Tool Kit are available on selected processors within the C5000 and C6000 DSP platforms. The Analysis Tool Kit is available free with CCStudio 2.2 for all registered users via the Update Advisor, or live update capability within the software, or at www.ti.com/fastsimulation. CCStudio is available today for $3,595 with the first year annual subscription, or through a free 90-day evaluation version. Nobody argues that simulation is a great concept to find problems early in the design cycle before you commit to hardware. But as DSPs get more complex and integrate more functionality, and as DSP applications push a million or more lines of code, simulators have been hard pressed to keep up. Until now, it seems. With some simulation runs now taking many hours, engineers would probably welcome a tool that cut that time in half, and an improvement by a factor of 5 would put a big grin on their faces. But claims going far further deserve a close examination”as in this press release, which claims a 21- performance improvement, as well as in the press background materials I reviewed where one slide charted a simulation with an improvement of 103-, and where another ran 458- faster than a competitive tool.
Clearly, the operative word here is “fast.” And although there's no formal name for the new simulator that comes with Ver 2.2 of Code Composer Studio (CCStudio), the company informally refers to it as the “fast simulator.” The press release on the left side gives you a rundown of the new tool set, which is a home-grown upgrade to the simulators and utilities that previously shipped with CCStudio. In the past, designers had access primarily to CPU-only simulators that are good for algorithm development but aren't terribly effective for device or system evaluations. Yes, some of them can handle peripherals attached to a CPU, but those additions make the time needed to get results take far too long for most engineers to tolerate. Rather than wait too long for each simulate-modify-resimulate cycle, they simply did without and used other design methods. And a full-system simulator wasn't even available.
Most engineers won't accept on blind faith claims of efficiency gains such as they see in this announcement; they demand to have some idea of how TI managed to do so. Applications manager John Stevenson explains that the previous simulator models VHDL code directly, so it shows exactly what the hardware does, evaluating every move the CPU makes. However, many cycles take place in the background and are essentially invisible to the application code. TI figured out how to remove these cycles from loading down the simulator unnecessarily. “We didn't expose to simulation those hardware functions that are going on in the background, and we can do so with sacrificing any cycle accuracy or data collection,” reports Stevenson.
The data collection he refers to is what underlies the new analysis tools mentioned in the release. Previously the TI simulator could collect data, but only for a single event; you had to rerun the simulator for each different event. The new version collects far more data from multiple events during a single run, and the new analysis tools expose the data so engineers can make meaningful conclusions. As to the tools listed in the release, they've all been around for a while in some form, says C5000 tools manager Lori Vidra, but the tool developers believe they've not only improved each one but also now are the first to pull them all together in one simulation environment.
The simulator team, for example, is proud of its Cache Analyzer. The photos below show what the display looks like before and after code optimization in a typical application.
The idea underlying the cache analyzer is to look for patterns. Suppose a program executes Function A, but then needs to run Function B after first dumping Function A, and this swapping goes on quite often. Such thrashing in the cache can have a big impact on execution time. In the past you could count the number of cache misses, but now you can see a pattern and identify the cause of the problems. Then, for example, you could try different linker commands, placing portions of the program in different regions of memory to see the effect on total performance. For a video tutorial on the use of this tool and a detailed description of these two screen shots and what they mean, go to www.go-dsp.com/mm-help/swfs/cache.htm.
Another online tutorial, which explains details of the pipeline stall analyzer, is at www.go-dsp.com/mm-help/pipeline.htm. With the help of this analysis tool you can keep the pipeline as full as possible, thereby helping the CPU to execute as many instructions in as little time as possible. Finally, the code-coverage tool (demo at www.go-dsp.com/mm-help/codecoverage.htm) and multi-event function profiler (demo at www.go-dsp.com/mm-help/meprofiler.htm) do their work without instrumenting your code. Remember how you'd add printf() statements to help identify where the program-flow went? Not only is that a brute-force method that can be time-consuming, these same statements can change the code behavior, sometimes in unexpected ways. With this simulator upgrade and data collection, code analysis takes place without any need to instrument the code.
Enough with an examination of the tools. Let's return to the fast simulator and now see if we can identify where the benchmark numbers came from. As for the magic number “21” in the press release, it's a benchmark of the speed improvement with the Ver 2.2 simulator compared to the Ver 2.1 release. Both were running a C55x CPU architecture simulation on the core only without any peripherals. The test ran the equivalent of 20 GSM-ERF frames (Enhanced Full Rate speech codec) on a 1.7 GHz machine.
And the factor of 458-? According to Vidra, consider Analog Device's Visual DSP++ Ver 3.0 running GSM-EFR frames. Her results show that such a setup can get through just 1 frame during a 5-minute run. If you turn off data collection (not an option with Visual DSP++, she adds) and run just the core simulation for a C64x ISA, the TI simulator can run through 458 frames in that same amount of time. Perhaps more realistic is a comparison of the ADSP-21535 vs. the C6416 in how much time each respective simulator needs to run the same number of GSM-EFR frames, in this case 20 frames. Both those simulations are running at the device level, not the core level, and here the TI chip takes 58 seconds while the ADI part needs 4513 seconds”leading to a speed improvement of 103- (roughly one minute vs. one and a quarter hours). Yes, there's a tad of specsmanship going on here, but that's probably why these figures were in the backgrounder but not the official press release.
Even at these speeds, simulation isn't everything it could be. For example, hardware-in-the-loop simulation at this level isn't yet possible. The Ver 2.2 simulator isn't exchanging data with any emulation hardware. Nonetheless, TI is pushing simulation into new areas where at these speeds it starts to make sense, going beyond system design and prototyping into product development, system integration, and test.
Finally, note that the release states how the fast simulator is available for “selected” processors. In more detail, it's now shipping for some members of the C55x, C62x, C67x, and C64x. You can look for some C2000 support towards the end of this year. As for the OMAP family, because of the dual DSP/ARM architecture, they'll be eventually picking up only some of the pieces of the DSP-centric tools announced here.
*The review was originally compiled at ChipCenter by Paul Schreier.