CMP EMBEDDED.COM

Login | Register     Welcome Guest RFID World  esc india  TeardownTV
 

The benefits of configurable DSP



Embedded.com
Oz Levia of Improv Systems describes the design and use of the VLIW Jazz configurable processor.

Consumer and communication products typically support applicationsthat require extensive and intensive computation and transformation of data as an integral part of the product. Some examples of such application include video decoding and encoding, voice compression and decompression, image processing and compression, audio play back, and communication channel decode and encode.

Each such product demands a solution that is efficient in area and power and is optimized for application specific requirements. The specific requirement is a function of the application and the product it is destined for. For example, a video product for hand held devices would require low power and low cost with lower image quality. A video product for broadcast markets would require very high performance and very high image quality.

In addition, many products today require support for multiple applications or multiple formats of the same application. Video products that today support MPEG2 would need to add support for MPEG4 and H26L in addition to MSWM and other video standards. Communication devices for WLAN will need to support different versions of the 802.11 standard and possibly other wireless standard formats.

At the core of such products is a digital signal processing (DSP) unit that is the workhorse of compute-intensive applications. To be effective for such demanding products and applications, a DSP should be programmable, configurable, and scalable. The configurable Jazz DSP is a programmable core that was designed by Improv. It is used in consumer and communication products that require high performance, low power and flexible architecture that can be adapted to the specific needs of an application.

Before we describe the architecture and specific features of the Jazz DSP, we would like to motivate the need for a programmable, scalable, and configurable DSP. The overall goal is simple: to provide in a single architecture framework support for many applications with different requirements while affording the use of high-level language.

Useful definitions

A few definitions are useful: a programmable DSP is a DSP that can process instructions and does not execute a fixed function. Instructions for a DSP can be produced by a compiler from high level language (C, Java, C++) or can be written by hand (assembly code). A configurable DSP is a DSP that can be modified in one or more ways that will fit the needs of a products or an application. Configurable DSPs are modified before they are implemented in silicon. Finally, a scalable DSP is a DSP that can 'grow' or 'shrink' to support different requirements in the products. In some cases, that capability may be extended to support multiple DSPs.

Fig 1: Multiple Jazz DSP processors in a platform

The most obvious and immediate application of a configurable processor is to provide support for optimization of application run time. Configuration of the DSP processor can enhance performance in three distinct ways.

  • Scale: By increasing and decreasing the available resources in the DSP processor, an application can experience different levels of performance. Scalability can be in the form of more resources in a single DSP, or by using multiple DSP cores. However, to take advantage of such scalability the DSP must be complemented with a powerful compiler that can make use of additional resources for performance optimizations. Without a compiler, configurable DSPs and configurable processors will only contribute to a lengthy design cycle.
  • Location and mix: By changing the organization of the resources (without any addition of resources) the configurable DSP can provide different levels of performance for different applications. For example, register location (to avoid spills) can have significant influence over inner loop performance. Again, a strong compiler is a crucial component in using such a configuration.
  • Custom Instructions: Every application has some operations (mathematical, or otherwise) that are less suitable for a given DSP. It is not practical for a single DSP to include ALL types of such instructions or operations. Nor is it practical for DSP vendors to design in advance all such operation and instructions. Configurable DSP gives the user the option of inserting custom instructions into the DSP for the benefit of the application.

Fig 2: DDCU (with custom instructions) in a Jazz DSP

Performance is but one dimension of optimization for a specific product or application. In some cases, power consumption or area (cost) are priority objectives. Configurable DSPs allow the trade off between performance, area and power. For a given level of performance a configurable DSP will provide the best area and power consumption in a programmable DSP.

Unlike standard DSPs where higher performance means higher clock cycle and higher power consumption the use of custom instruction in configurable DSPs may actually result in higher performance and lower power consumption, because fewer clock cycles and less logic are used to compute the same result.

The flexibility and productivity afforded by a configurable DSP is of little use without programmability. Two related capabilities are significant in considering processor programmability.

  • Instruction execution: The DSP executes the application from an instruction stream (object code). This capability is now mandated for products and applications that support multiple standards or formats, since it is impractical to use fixed-function DSPs to support yet unknown or a large number of formats. By writing and running a new instruction stream, programmable DSPs can be used in more situations and cases.
  • High level language: C and Java. It is hard to over-emphasize the importance of high-level language programmability for any processor. High-level language provides productivity (easier to write and verify), flexibility (easier to change), and maintainability (easier to debug and modify). Without high level language, programmable cores are of reduced value. The key in gleaning benefits from high-level language support is the availability of an efficient compiler and optimizer.

The configurable Jazz DSP is a Very long Instruction Word (VLIW) programmable DSP. The Jazz DSP is unique among programmable DSPs in that it is configurable and scalable. It is possible to customize the Jazz DSP for application specific requirements and optimizations. Jazz is also unique among VLIW processors because it has an efficient high-level (C and Java) optimizing compiler.

The Jazz DSP was also designed to scale for the computational need of an application. The user can configure Jazz such that it has just the right number of parallel execution units that fit the performance requirement of a given application. It is also possible to organize multiple processors to work together in an array.

Fig 3: Jazz 2020 - high performance, low power DSP core

Being a VLIW DSP, multiple ALU-like structures called Computational Units (CU) are organized such that an instruction slot controls each one. When two CUs share an instruction slot they are said to be overlaid and are not available for use simultaneously. CUs provide the mix of instructions available in the Jazz processor and the organization of CUs in the instruction slots govern the number and type of instructions available in each cycle.

Typical CUs are ALU units with arithmetic operations, MAC units, Shift units, counters, and other instruction units. The Jazz DSP supports 32 and 16bit data path in fixed point arithmetic. Most CUs support SIMD operation for Byte and half words data.

In a similar way, Memory Interface Units (MIU) are also organized and controlled by instruction slots. MIUs provide for data access and address generation. Jazz supports 16 bits address space.

Registers are distributed in the Jazz DSP without a use of a single register file. This puts computational results close to where they are generated and consumed. It also allows more flexibility for the compiler in the process of allocating data to storage locations.

A data communication block facilitates data movement between storage locations and inputs to CUs. This block includes several MUXed buses that are under the control of the compiler.

Applications that require digital signal processing typically require very high bandwidth of computation. For example, a simple transformation of 16 and 32bit words of data can take several hundreds of different computations such as multiplication and summation. That, coupled with large amounts of data, dictate a need for a very efficient computation platform.

One approach to get many computations accomplished each second is to perform as may operations as possible simultaneously. VLIW processors are designed specifically for this goal.

Since each instruction-cycle word contains multiple slots, it is possible (for the compiler) to specify multiple actions each cycle &endash; one for each slot. The result is a processor that can execute several OPS per cycle. VLIW is especially useful for digital signal processing since DSP operations tend to be regular and repetitious with little or no control code.

VLIW for configurable DSP

Jazz is a pure VLIW architecture, where one slot in the instruction word controls each operation. This type of VLIW architecture is very easy to control and understand and typically will yield very high performance. In addition, a pure VLIW architecture is also very easy to extend and configure. We take full advantage of this capability in our design of the Jazz DSP.

The most obvious parameter to control in a VLIW is the number of slots &endash; or in other words, the instruction level parallelism. By adding slots to the instructions (and CUs to Jazz) one can increase or decrease the number of operations that are available in each cycle. Since CUs and MIUs are controlled in much the same way, it is also possible to extend the available memory interface units. At the same time, adding additional CUs to Jazz has the effect of varying the mix of available resources in the processor to the benefit of the compiler's ability to efficiently schedule parallel operations.

That single capability, to add or remove slots from the instruction, allows the use of the Jazz DSP in a very large number of configurations and can result in an optimal processor for a given application.

Data and computation-intensive application computational resources are but part of the problem; another is temporary storage. In the Jazz DSP, it is possible to add, remove and even re-organize data and address registers in different locations in the architecture to allow the compiler to make better use of temporary storage.

Since Jazz is a pure VLIW, it is also possible to insert into a slot a custom CU. Such a Designer Defined Computational Unit (DDCU) can include custom instructions and can add to the available vocabulary of the compiler. This capability can further optimize the DSP for the needs of an application.

Flexibility, parallelism, and configurability come with a cost. In this case, the cost is complexity of targeting high-level code to the Jazz DSP. Considering support for such flexibility without a good compiler is impractical.

The Jazz System Compiler is a VLIW optimizing compiler that can turn sequential C code into VLIW instructions using as many operations per cycle as possible. The compiler can also 'understand' changes to the Jazz DSP configuration (additions or deletions) and can even generate code for DDCUs inserted by the user for application specific instructions.

Configuring and optimising

The goal of a configurable, scalable DSP, as described in the previous sections, is to enable a good match between applications and compute platform. This goal would be hard to attain if not for a complete methodology that supported the user in the quest for optimizations. This flow is illustrated in figure 4.

Fig 4: Design flow for configurable DSP

The star of the show, so to speak, is the digital signal processing application. That application is expressed in C or Java high-level code and is verified and tested on the host. In addition to the functional verification, it is critical to have a specific goal in mind that can drive the process of optimization. The goal may be oriented towards performance, cost, area, or power consumption. In many cases the goal is a combination of all of the above.

The application and the end product will also have an impact on the specific Jazz DSP platform that is used as a starting point. For example, in a hand held device that requires low power and an application that requires 500-1000 MIPS, the choice will be a single processor with low level of parallelism. For a broadcast video application with high screen resolution, the selection may be for a high-end, two processor platform.

In some ways, the starting point will not effect the end result, but will make the process of convergence faster and easier.

Mapping

The Jazz PSA System compiler is an optimizing VLIW compiler. Given a specific application and a specific Jazz DSP, the compiler will map the C code to object code for the specific Jazz DSP. The process of mapping is global and contains many optimizations and transformations &endash; all are aimed at producing the best fit between the source code and the Jazz DSP under consideration. The overriding goal is performance. The compiler will try to minimize the number of cycles required to execute a given application. Another consideration is space. The compiler will work to minimize the space required for instructions (object code) and for data.

As noted before, the most effective way of optimizing a DSP algorithm is to do as much work as possible in each cycle. Since the amount of work a given Jazz DSP can do in a cycle is known, the job of the compiler is to fill each cycle with useful work and to use as few cycles as possible.

The compiler can do its best, but sometimes there are parts of the application that just don't map well to a given Jazz DSP. The process of configuration starts with providing the user with data to point to areas that can be improved:

  • High percentage code: Portion of code that consumes large numbers of cycles (dynamically) and thus deserves more attention.
  • Low utilization code: Portion of the code where the compiler was unable to make good use of the resources of the Jazz DSP.
  • Poorly used resources: Resources in the Jazz DSP that are under used or that could be eliminated. Other feedback may point to resources that can be organized in different ways to better take advantage of parallelism or to conserve space and time.
  • Missing resources: Resources (registers, instructions) that if added can make a measurable improvement in the performance of the application.

Using results from compilation and profile, the user can then make changes in the code and configuration. Changes may include simple loop re-write, change of types used, or other syntax changes that may make the work of the compiler more straightforward.

The user can also make changes to the configuration of the Jazz DSP processor. The user can add or remove resources, can re-organize the resources in different ways, and can insert new custom instructions that best fit the needs of the application.

Making trade-offs

The process is by its nature, iterative. The user makes changes &endash; compile - and profile. Each cycle, getting closer to the target. The key in this process is to keep front-and-center the goals for optimization and to understand how changes in one area (i.e. to improve performance) will effect other areas such as power consumption. The user is aided in this process with sophisticated tools that give accurate and early predictions of the impact of a change in the configuration.

Embedded DSP applications require best performance at low power consumption but without loss of flexibility. Programmable, configurable DSP processors like the Jazz DSP are ideal for this task. Supported with advanced optimization methodology and a superb compiler, the Jazz DSP is the first member in a generation of new cores that provide flexibility and optimization together with rapid design methodology.


Oz Levia is the chief technical officer, senior vice president of field operations and a founder of Improv Systems. He is the editor-in-chief of a series of books entitled Current Issues in Electronic Modeling (Kluwer Academic Publishers, currently 11 volumes) and has published over 30 papers on electronic design, high level synthesis and architectural modelling.

Published in Embedded Systems (Europe) September 2002
1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS


 :