Optimize data flow video apps by tightly coupling ARM-based CPUs to FPGA fabrics - Embedded.com

Optimize data flow video apps by tightly coupling ARM-based CPUs to FPGA fabrics

In this Product How-To design article, iVeia’s Michael Fawcett describes how combining an ARM-based TI OMAP CPU and Xilinx FPGAs can be used to design a system for handling rich data-flow video processing apps via the inherent parallel structure of an FPGA fabric.

Design teams have long used FPGAs in tandem with standard microprocessors both as a way to add peripheral functions and as a processing resource capable of operating on real-time data streams such as video. To maximize performance in such applications, designs must tightly couple the FPGA and microprocessor, instead of treating each as independent entities.

Today, off-the-shelf platforms tightly integrate the processor/FPGA combination. Development tools allow an embedded design team to optimally partition their design making tradeoffs between software or hardware implementations.

In the product design group at iVeia,we have been building systems that that closely link processors and FPGAs to create full featured advanced technology products for the video, communications, and handheld applications spaces. We are now working on a next-generation iVeia system that we think will be even more formidable using the new Xilinx Zynq-7000 Extensible Processing Platform, that marries dual ARM processors with the latest 28nm programmable logic on the same device (Figure 1 below ).

Figure 1: Unlike previous chips that combine MPUs in an FPGA fabric, Xilinx’s new Zynq-7000 EPP family lets the ARM processor, rather than the programmable logic, run the show.   (To view larger image, click here)
Typical integration of FPGAs and CPUs
In a typical system, the microprocessor handles command and control and parts of an application such as audio codecs. The FPGA, meanwhile, can perform real-time tasks such as video codecs and image processing or performance-intensive communications algorithms in SDR (software defined radio) applications.

Of course design teams have had access to processors on FPGA ICs for some time. Xilinx and other FPGA vendors have offered design teams the ability to realize soft processor cores using the FPGA fabric. Moreover, some FPGAs have combined hardened processor cores as in the integration of PowerPC cores into the Xilinx Virtex family of FPGAs.

But both the soft processor approach and the aforementioned PowerPC-enabled FPGAs are FPGAs first and foremost. The FPGA fabric must be configured before a processor can boot, and the design process is centered on the FPGA development tool flow.

Processor-centric designs
Some design teams are more comfortable working with a model in which the microprocessor is the heart of the system where the FPGA is utilized as a coprocessor – albeit a very powerful one. The processor-centric model allows design teams to immediately begin software development concurrently with the design of the FPGA subsystem that will handle the portions of the application that require parallel processing and manipulation of real-time data streams.

Let’s examine the options a design team at a company like ours to quickly develop and deploy systems on a tightly integrated processor and FPGA that boots like a typical microprocessor-based system. iVeia has a family of mezzanine-style modular products that combine processors and FPGAs.

We also offers a series of application-specific development kits along with a general-purpose kit. An SDR Development Kit targets wireless applications such as unattended sensor networks where the FPGA handles intermediate-frequency and baseband processing. The company also offers a Video Development Kit and a Handheld Development Kit.

Design teams can quickly access tremendous processing power utilizing such a modular approach. Our Atlas-I-LPe is based on a Texas Instruments’ OMAP processor (Figure 2 below). The OMAP IC combines both an ARM microprocessor core and a DSP core. Moreover, the module includes a Xilinx Spartan-6 FPGA.

Figure 2: The iVeia Atlas-I-LPe design utilizes dedicated interfaces between the FPGA and OMAP processor allowing the FPGA to implement pre- and post-processing functions on a live video stream ( To view larger image , click here).
The standard product integrates a XC6SLX16 FPGA with 14,579 logic cells and 136 DSP slices. Optionally, iVeia offers a version with the XC6SLX45 that integrates 43,661 logic cells and 401 DSP slices.

The Atlas-I-LPe affords a number of advantages to embedded design teams. The ARM architecture is arguably the most broadly-favored choice in the embedded industry today. Most design teams are familiar with the ARM architecture and instruction set.

Moreover, most design teams are familiar with the development tools and the entire ecosystem that includes code libraries and other IP that can accelerate the design cycle. In the case of the Atlas-I-LPe, the OMAP IC integrates an ARM Cortex-A8 core that operates up to 1 GHz.

The ARM implementation includes the optional Neon SIMD (single instruction multiple data) unit that is optimized for multimedia processing. Moreover it also includes a double-precision floating-point unit.

The OMAP IC also includes a broadly-utilized DSP core with which many design teams will be familiar. The TMS320C64x+ core operates at speeds as fast as 800 MHz. Moreover there is an extensive support ecosystem behind the DSP including development tools and algorithm libraries.

Image processing and data flow
The iVeia design maximizes the performance potential of the processor and FPGA combination via multiple connections between the devices. Let’s examine how the two processing elements are connected relative to how the product might be deployed in a specific application.

The combination of the FPGA fabric and DSP core are a good match for image processing. Typical applications include transportation systems where a camera is used to capture video and the system analyzes the input on a frame-by-frame basis, automatically recognizing and classifying traffic. The combination could also be used in security systems to enable facial recognition.

Such image-processing applications require significant processing power and a very efficient data flow. The iVeia design utilizes multiple buses to optimize data flow in such an application. The OMAP IC includes both a dedicated camera input interface and a dedicated display output interface designed to drive two screens. The IC also includes a general purpose bus. iVeia uses all three to link the processor and FPGA.

The iVeia design implements a 12-bit, 75-MHz interconnect that can feed a stream of video frames directly from the FPGA to the OMAP camera interface. A key role for the FPGA in a vision system is preprocessing each video frame. The FPGA can operate on the real-time stream of frames performing functions such as color-space conversion and noise filtering. The FPGA can correct for camera lens distortion and enhance contrast.

Video analytics
The FPGA can also perform the early stages of video analytics. For example, the FPGA can be used for object and pattern recognition, edge detection and other image enhancement capabilities. The sequentially-oriented DSP and ARM cores can’t perform such functions in real time. The FPGA passes the processed frame to the OMAP IC for storage in a frame buffer using the dedicated camera interface.

The iVeia design also includes a 16-bit, 96-MHz, bidirectional, general-purpose bus that links the FPGA and OMAP ICs. The bidirectional interconnect provides an additional path for the FPGA to transfer data to the OMAP IC, and a path for the OMAP to manage the operation and configuration of the FPGA.

In our vision system example, the FPGA would send object classification data to the OMAP IC in sync with the transfer of the preprocessed video frames.

The OMAP IC meanwhile can use the general-purpose bus to dynamically reconfigure the image-processing blocks in the FPGA. Many people think about an FPGA as a static element that is configured at power up and that performs the same functions continuously. In actuality, the fabric can be configured a hundred thousand times or more per second.

In a typical scenario, a primary set of command and control functions are static in the FPGA. But a technique called dynamic partial reconfiguration allows for changes in portions of the fabric. For example, the OMAP may change the configuration of the image-processing blocks in the FPGA based on the type of images being captured.

The iVeia architecture also supports video post processing functions in the FPGA. The design includes a 24-bit, 75-MHz interface that brings real-time data from the OMAP display output into the FPGA. The FPGA can handle functions such as scaling in real time.

Design teams focused on a video-centric application such as our example can accelerate the product development using iVeia’s Video Development Kit. The kit includes an additional hardware module that provides I/O and a video encoder. More importantly, iVeia supplies a library of IP blocks for the FPGA that are optimized for video applications.

Arm benefits
There are substantial benefits to the ARM processor and FPGA combination relative to other processor options. As mentioned earlier, the ARM architecture is very popular, and that popularity delivers what are at first glance surprising benefits beyond the fact that embedded design teams are familiar with the architecture and instruction set. Embedded applications can leverage a wealth of open source software as well as software developed for handsets such as the Android operating system and user interface.

It turns out that users of a specialized embedded system – such as a video-centric transportation system – greatly prefer a known user interface. Users are increasingly familiar with the touch-based handset interfaces and fingertip sweeps used for navigation. By leveraging Android, an embedded design team provides a popular user interface without having to develop custom user-interface software.

The iVeia ARM-based module supports Android in addition to applications developed for the Linux-based operating system. For example, a transportation application may include a GPS module. An ARM-based design running Android can provide the same GPS experience found on smartphones.

The ARM core has also been optimized for low power consumption through multiple generations of usage in handsets. The combination of an ARM processor and FPGA can offer lower power that translates to longer battery life in portable products.

Embedded design teams will have access to an even more powerful ARM processor and programmable logic implementation when Xilinx delivers the first Zynq-7000 devices next fall. The Zynq-7000 EPP family will integrate a dual-core ARM-Cortex-A9 based processor system with most advanced programmable logic into a single device. . The 28-nm devices will use Xilinx’s 7-series FPGA architecture that is the basis of the new Artix-7 and Kintex-7, products.

Unlike prior FPGA ICs that integrate hardened processor cores, the Zynq-7000 family of devices provide a comprehensive state-of-the-art microprocessor system that boots just as any other microprocessor IC. The products also have the on-chip programmable logic that, under control of the processing system, is configured after the processor boot-up sequence is completed.

Next-generation designs
A Zynq-7000-based embedded system will afford design teams another huge gain in system performance and a reduction in power consumption. Much of the performance advantage will come from even tighter integration between the processor core and the programmable logic .

The IC design will rely on the latest AMBA 4 (Advanced Microcontroller Bus Architecture) AXI (Advanced Extensible Interface) interconnect technology to interface the processor to functional blocks within the FPGA fabric.

AXI will offer three times wider buses that are four times faster than the interconnect used in the OMAP-based Atlas-I-LPe module, with a projected “orders of magnitude” lower power consumption for Zynq-7000-based designs over previous generation Virtex PowerPC designs.

iVeia has already demonstrated Android running on Xilinx’s Zynq-7000 Emulation Platform . While the effective clock rate of the processor emulation is low, the programmable logic operates at the specified frequency. iVeia has been able to demonstrate an Android application with live video via camera link interface. The Android application controls parameters such as image filter settings and other coefficient values used in object recognition and pattern matching used in imaging systems.

iVeia plans to have a module based on its Atlas form factor ready for shipment when Xilinx delivers the first Zynq-7000 EPPs. Indeed the company has already announced the Atlas-I-Z7e module.

The Atlas form factor measures 1.249×3.37 in. The Atlas-I-Z7e will utilize a Zynq-7020 IC that is based on the Artix-7 FPGA fabric that is optimized for low cost and low power. The processing system dual MP Cortex A9 ARM cores will operate up to 800 MHz.

Neon SIMD engines and single/double precision floating point units for each core are included as part of the processing system The design also will include 32 Kbytes of level 1 cache and 256 Kbytes of level 2 cache with 256Kbytes of on-chip memory accessible by the programmable logic via high speed interconnect switches.

The Z7020 device programmable logic includes 85,000 logic cells or about the equivalent of 1.3 million ASIC gates – offering significantly more processing resources than the Spartan-based predecessor, 220 programmable DSP slices and 560 kbytes of block RAM. The DSP resources more than make up for the lack of a DSP core in terms of processing real-time data streams. Indeed, independent DSP experts at BDTI have documented that FPGAs can offer 40x better performance than DSP-centric processors in real-time data-flow applications.

Embedded design teams can start development projects for the Atlas-I-Z7e now using the OMAP-based Atlas-I-LPe. The microprocessor cores in each are software compatible. Moreover, iVeia supplies what it calls the Velocity-EHF (Embedded Hybrid Framework) application portability layer for hybrid processor plus FPGA systems. The company claims that designs have been ported in a single day from modules based on a PowerPC-enabled Virtex FPGA to the OMAP-based Atlas-I-LPe. The port to the Zynq-7000-based module should be equally simple.

At the recent Embedded World conference in Nuremberg, Germany, iVeia showcased the Atlas-I-LPe production board and demonstrated full system interoperability using an emulation platform to prototype the Atlas-I-Z7e.

The system demonstrated was running Android 2.2 managing the image enhancement of a live video stream using custom AXI IP implemented in the programmable logic running on the emulation platform. You can watch a video of the Android application with live video manipulation on the EE Times web site .

Michael Fawcett is the Chief Technology Officer at iVeia (Annapolis, MD). You can contact him at . Dan Isaacs is the Director of Processing Platform Marketing, . (San Jose, CA). You can contact him at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.