Floating-Point VLIW DSP/ARM7 Combo delivers 1 GFLOPS at 100 MHz - Embedded.com

Floating-Point VLIW DSP/ARM7 Combo delivers 1 GFLOPS at 100 MHz

Floating-Point DSP/ARM7 Combo Enables Low Cost Acoustic Diagnosis Systems, Professional Quality Home Audio and High Quality Speakerphone

San Francisco, CA, electronicaUSA, March 29, 2004, ” Atmel today introduced its AT572D740 Diopsis a dual processor system-on-chip, with an ARM7, peripherals and a complex domain, 40-bit precision, VLIW, floating point DSP. The high-end processor executes 15 operations per cycle, Diopsis enables hands-free phones with speech quality comparable to face-to-face conversations, radar-based automobile collision avoidance, acoustic diagnosis of mechanical equipment, software-based ultrasound scanners, and professional quality audio for moderately priced home entertainment systems, among others.

Diopsis has achieved significant early market acceptance, having been designed into an ultrasound scanner and a high-end audio processing medical system.

Extended Precision Enables Professional Quality Audio ” “Diopsis cost-effectively provides the floating-point, 40-bit extended precision required for the analysis and production of professional quality sound,” Paolucci noted. “High quality sound displays 132 dB (22-bit) transient impulses embedded in a 16 bit (96dB) signal. A 32-bit mantissa accommodates guard bits to keep the processing noise low enough. An additional, 8-bit exponent is needed for automatic dynamic management. Any data word smaller than 40-bits or fixed-point arithmetic substantially degrades the sound quality. Diopsis is the first processor to offer a complete RISC plus floating-point DSP platform at a price that makes professional quality affordable in moderately priced systems.”

Hands-free phones – “High-quality hands free phones have similar processing requirements,” Paolucci explained. “The echos and reverberation in most conference rooms result in sound quality that is barely acceptable, even using a $500 speakerphone, Diopsis floating-point processing throughput and 40-bit precision support adaptive echo cancellation from up to eight microphones. The goal is to provide speakerphone sound quality that is as good as, or better than, a person in the same room. And with Diopsis the retail price of the speakerphone will be $150 to $200, instead of $500.”

Anti-Collision Radar – The floating-point arithmetic executed by Diopsis' mAgic DSP is also useful for cost sensitive radar applications, such as automotive collision avoidance systems. These types of adaptive beam-forming applications are impractical using a fixed-point DSP, because they rely on floating-point arithmetic and matrix inversion. Diopsis is expected to accelerate the adoption of these emerging technologies by providing a high performance solution at a moderate cost.

Heterogeneous Cores Optimize Performance – According to chip architect, Dr. Pier Stanislao Paolucci, “The heterogeneous dual core structure of Diopsis allows optimization of task partitioning and mapping on the two different cores with respect to both performance and code density requirements. Diopsis exploits the 16-bit code density and efficiency of the ARM Thumb processor for control tasks, and the GFLOPS complex arithmetic performance of Atmel's mAgic VLIW mAgic DSP. mAgic has a program memory density of 4-bits per arithmetic operation on numerical kernels, thanks to our VLIW compression system. Either processor may act as the master or slave, or both processors may operate simultaneously and independently.” A 128 KB program memory holds 24,000 cycles of compressed program cycles. The DSP has a 16K by 40-bit Dual port data memories, and 256 pairs of 40-bit registers.

At 100 MHz, mAgic VLIW DSP throughput is 1.5 billion operations per second (GOPS), one billion of which are floating-point. The ARM7 executes at 50MHz. Peripherals includes two SPI serial ports, two USARTS, a timer counter, watchdog, parallel I/O port (PIO), peripheral data controller, 8 ADC and 8 DAC interfaces, clock generator and interrupt controller. Power dissipation is 750 mW per GFLOP, typical, 20% less power consumption than competing 32-bit, stand-alone, floating-point DSPs.

Reduces Time-to-Market From Months to Days ” DSP algorithms are written using floating-point arithmetic and must be translated to a less robust and less precise fixed-point format to be used with the fixed-point DSPs that dominate the market. This process takes months. Diopsis floating-point DSP offers a direct “algorithm-to-code” capability that reduces code development time to a few weeks and enables efficient implementation of modern signal processing algorithms that make intensive usage of complex domain arithmetic. Such algorithms include signal analysis algorithms based on the short time Fourier transform or complex wavelets used in audio and speech processing, spectrum analysis / surveillance, and vibration analysis for structure diagnostics.

Pricing and Availability ” The Diopsis AT572D740 is available now in 352-ball PBGA package and is priced at $30 in quantities of 1000 (industrial temperature range), about the same price as stand-alone, floating-point DSPs.

Diopsis can also be used as platform for custom ASIC development.

Atmel Corp. has added some new wrinkles in the DSP solutions arena by combining its floating-point VLIW digital signal processor core with an ARM7 microcontroller. Among them is the message that a floating-point DSP/MCU chip can be cost competitive with a standalone floating point DSP in such applications as hands-free phones, radar-based automobile collision avoidance and professional quality audio equipment. The combo chip goes for $30 in quantities of 1000 (industrial temperature range). Also, I see the announcement today as another big endorsement for the ARM core that's rapidly becoming an industry standard for microcontrollers.

Until now, most of the activity in floating-point DSP has centered around Texas Instruments TMS320C6000 line and the Analog Devices SHARC family. TI and ADI have offered multiple core DSPs. However, neither company has as of yet combined their floating point DSP architectures with an on-chip microcontroller.

The performance specs for Atmel's new AT572D740 Diopsis processor are quite noteworthy. At 100 MHz, the DSP/MCU combo chip delivers 1 billion floating-point operations per second (1 GFLOPS) and 1.5 GOPS total. The chip consumes 0.8 W typical and 1.4 W worst case, running 1-K point Fast Fourier Transform on 40-bit floating-point data stored in internal memory, with continuous ARM accesses to external flash. The 40-bit precision, VLIW DSP processor executes 15 operations per cycle.

There are also lots of peripherals functions embedded on chip, which is another plus for the device. There are two SPI serial ports, two USARTS, a timer counter, watchdog, parallel I/O port (PIO), peripheral data controller, 8 ADC and 8 DAC interfaces, clock generator and interrupt controller.

Atmel's DSP/MCU design optimizes task partitioning and mapping on the two different cores with respect to both performance and code density requirements. The unit exploits the 16-bit code density and efficiency of the ARM Thumb processor for control tasks and the GFLOPS complex arithmetic performance of Atmel's mAgic DSP. Diopsis makes intensive usage of complex domain arithmetic, including signal analysis algorithms based on the short time Fourier transform or complex wavelets used in audio and speech processing, spectrum analysis/surveillance, and vibration analysis for structure diagnostics.

There are other features that system designers should like such as fast code development. Diopsis offers a direct “algorithm-to-code” capability that reduces code development time from months to a few days. Floating-point algorithms can be directly programmed into the device, without the fixed-point translation required by traditional fixed-point solutions. Floating-point code facilitates algorithm reuse, reduces the calibration required to tune new product generations.

VLIW code development is facilitated by a scheduling algorithm that automatically analyzes the logical and temporal data dependencies and then schedules operations in a way that optimizes both resource usage and pipeline depth to achieve maximum execution throughput. This process is said to be entirely seamless and requires no intervention on the part of the programmer.

The 128-bit instruction word mAgic VLIW DSP core on Diopsis enables real and imaginary arithmetic results simultaneously, allowing the single-cycle execution of FFT butterflies, complex MULACC, and real domain dual MACs. The DSP has a highly parallel architecture with four multipliers, three adders and three subtractors. During complex arithmetic operations, half the operators perform real operations and half perform imaginary operations simultaneously. Two 4-input, 4 output – by 256 location register files can be used to store 40-bit real and imaginary numbers separately, thereby enabling single-cycle complex arithmetic on extended-precision floating-point. Data from either register file may be input simultaneously to both sides of the operator block, as may the intermediate results of operations within each side of the operator block.

This capability reduces the number of register file fetches and execution cycles by a factor of two during complex multiplications. Two sets of three 2k by 40-bit pages (12 KB total) internal dual port memory allows four simultaneous accesses (two reads and two writes) A multiple address generation unit (MAGU) with 16 address registers supports programmable stride on linear, circular and bit-reversed addressing.

The DSP uses an 8K by 128-bit single-port program memory. The DSP assembler automatically compresses program code by a mean factor of two to three, resulting in an average effective instruction density of 50-bits per stored cycle without loss of performance. Numerically intensive operations such as FFTs and FIRs can achieve code density of 4-bits per executed floating-point operation without loss of performance.

The control registers and memories of the DSP are mapped directly into the ARM memory space, allowing the ARM to read or write the DSP local data memories and configuration registers. There are two modes of operation: run mode and system mode. In system mode, the VLIW processor halts and all the internal resources of the DSP are mapped into the memory space of the ARM. The ARM controls the DSP's DMA channel and can read and write the local data memories and configuration registers of the DSP. The ARM can modify the content of the DSP program memory initiating a DMA transfer from the external memory or by directly writing four 32-bit words to four consecutive addresses at the appropriate program memory location. This complete visibility through the ARM into the DSP resources allows code for both processors to be debugged using the ARM debugging tools.

In run mode, the ARM has access only to the mAgic VLIW DSP's command register and a 1K 40-bit dual port shared memory. Both processors operate under their own programs and either processor may operate as the master. Since mAgic has a private external bus for optional external memory the two processors may operate completely independently and simultaneously.

A dual port-shared memory of 1K extended precision locations is used for high bandwidth interprocessor communications between the ARM and the DSP. There are 9 interrupts from the DSP to the ARM and three interrupts from ARM to the DSP. The DSP can drive 7 of Diopsis' 28 PIO lines and receive interrupts from 5 PIO lines. PIO lines are shared by both processors and are fully software configurable by the ARM.

The tight interface between the two cores supports a variety of programming models. Diopsis may be programmed entirely from the ARM programming interface, using calls from the DSP library to execute DSP functions. Composed of 75 C-callable DSP functions, the library includes a variety of FFTs, IIRs, FIR on single samples sequence or on continuous input data stream, vectorial square roots, vectorial magnitudes, and vectorial arithmetic operations, among others.

The RISC and the DSP may also be programmed separately. A unified programming environment supports both programming models, and provides a cycle accurate simulator for the whole Diopsis SoC.

The AT572D740 is available now in 352-ball PBGA package and is priced at $30 in quantities of 1000 (industrial temperature range). Diopsis can also be used as platform for custom ASIC development.

###

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.