CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Accelerate system performance with hybrid multiprocessing and FPGAs



Embedded Systems Design

Multiprocessing is becoming a key differentiator for FPGA-based processor architectures.

Of the design benefits that FPGAs provide embedded systems designers, one key advantage is the ability to adapt and quickly respond to changing system requirements. FPGAs have evolved from the simple interface logic devices of yesterday into highly sophisticated processing devices that are capable of integrating and accelerating entire embedded systems. Modern FPGA-based systems often include multiple soft and hard processors running industry-standard real-time operating systems (RTOSs), along with processor peripherals and custom hardware accelerators for performance-critical algorithms. As a direct result of these capabilities, FPGAs are now being used to develop highly flexible, hybrid multiprocessing applications and systems.

Embedded systems designers face a wide range of processing-related design challenges. Real-time and performance-critical systems demand increased performance, but also require lowered power consumption. Critical embedded applications may require dedicated computing hardware or the use of additional processors to meet performance and power constraints.

To address the performance barrier, a standard approach in the past has been to raise the operating frequency of the processor. Increasing clock speeds increases power consumption, however, so embedded systems designers have turned to other approaches to improve the performance/power ratio. These approaches include the use of additional processors or through the use of specialized coprocessors including FPGAs.

Adding additional devices to a system can be costly, especially when considering the requirements for increased system reliability and sustainable power budgets as well as physical size, thermal, and packaging constraints. Adding more devices to resolve performance issues forces other tradeoffs and adds yet another component to an already lengthy bill of materials. Modern FPGAs, with their ability to integrate multiple processors and coprocessors in a single device, provide one solution to this problem.

In a modern FPGA-based application, one processor may be used to run an operating system. Further integration may be achieved by adding additional coprocessors for noncritical algorithms. These processors can be integrated with dedicated hardware accelerators, all in the same programmable FPGA device.

The result is a hybrid multiprocessing application with a reduced component count.

Leveraging parallelism
Solving complex computational problems through integration and parallelism is not new. It's long been recognized that many of the computing challenges in embedded and high-performance systems can be addressed using parallel-processing techniques. The use of dual- or quad-core processors, multiple processing boards, or even clustered PCs has become commonplace in many applications. In embedded applications, traditional processors can be paired with DSPs, which are often paired with custom or off-the-shelf hardware accelerators.

In recent years, the trend has been to combine multiple processing elements on one device. One example of this multicored approach is the Cell Broadband Engine Architecture, jointly designed by Sony, Toshiba, and IBM.

The Cell architecture increases the performance of graphics and video applications by introducing system-level parallelism. It also supports a flexible, programmable acceleration that's highly optimized and provides for high clock frequencies while minimizing power. The keys to the Cell architecture's high performance are the Synergistic Processing Elements (SPEs) that provide coherent offload, abundant local memory, and asynchronous coherent DMA engines. End applications, such as multimedia and vector processing, benefit from the combination of the general-purpose processor core and streamlined coprocessing elements. (Editor's note: see "Programming the Cell Broadband Engine," Alex Chunghan Chow, June 2006, for more info on SPEs.)

Figure 1 shows Nvidia's Compute Unified Device Architecture (CUDA), another type of parallel processing engine. It's based on standard graphics processing units (GPUs), which are stream processors (highlighted in light green in the figure) that have been combined to form a general purpose, streams-oriented parallel processing engine. CUDA provides access to the native instruction set and memory of the parallel computation elements in the GPUs. Like the Cell processor, the CUDA architecture promises higher performance over standard processors, while simplifying software development using the standard C language for data-intensive problems.

View the full-size image

Parallelism at many levels
These architectures accelerate performance by providing dedicated processing engines operating in parallel. Parallelism can exist at many levels

• System level through using multiple CPUs and coprocessors

• Process level via multiple threads or communicating processes within each processor

• Subroutine and loop levels using unrolling and pipelining for example

• Statement level via instruction scheduling and via parallel ALUs

Where FPGAs offer a significant advantage is in the latter two types of parallelism. Parallelism is inherent in an FPGA's architecture and can be leveraged by hardware designers or by software-to-hardware compilers for algorithm acceleration. For this purpose, FPGAs are now being deployed alongside traditional processors in high-end computing systems, creating what might be called a hybrid multiprocessing approach to computing.

1 | 2 | 3 | 4

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :