This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.
Various applications, ranging from GPS to audio/video streamprocessing, require complex algorithms to be executed in real time.Many of these algorithms follow industry standards that are upgradedperiodically.
Engineers who are developing such applications are facing achallenge: to optimize the execution of these algorithms within thetight constraints on the unit cost, physical size and power consumptionof the device that is often manufactured in high volume, and strictlimits on the cost and development time. The end product must be ableto be adapted to upgrades in the processing algorithms at a reasonablecost.
For optimum algorithm execution, the basic rule of thumb is hardwarefor performance and software for flexibility. In practice, this rule isdifficult to apply. Hardware choices are limited to the basicarithmetic functions of the MCU core, the multiply/accumulate andlinear function processing of a DSP core, or the wider flexibility ofan FPGA with its downside of physical size, power consumption and unitcost in volume.
The alternative of a standard-cell ASIC can give a higher level ofperformance, but at a development time and cost that is oftenprohibitive. Software is ported onto the MCU or MCU/DSP combinationthat has been selected for the hardware implementation.
Once the hardware/software (HW/SW) partitioning has been made,altering it is extremely difficult and time-consuming, unless theapplication will go into volume based on an FPGA. Often, it is only inthe final stages of application development that the software can berun on the target hardware and when it can be determined whether theimplementation of the processing algorithm is optimal.
Atmel's CAP comprises MCU-based SoC that provides the basic processingcapacity and a highdensity block of metal-programmable (MP) digitallogic that can be personalized to provide DSP-like or other dedicatedfunction execution hardware.
It provides a reasonable development cycle time and cost. Thedevelopment flow for an application-specific CAP includes an emulationstep based on a development board that uses a high-density FPGA toemulate the algorithm execution functionality that will subsequently behardened into the MP block.
CAP enables an application developer to get the best of the FPGA andASIC worlds. The first phase of the CAP application development cycleuses FPGAbased libraries and tools. This is to make an initial HW/SWpartition of the algorithm and then map the hardware-based functionsonto DSP-like structures or other processing elements implemented inthe FPGA.
In parallel, the software-based algorithm processing is compiled forexecution by the MCU that sees the FPGA/MP block in its address space,with a distributed DMA architecture to optimize data flows between thefunctional and memory blocks.
|Figure1: HW/SW partitioning involves implementing an algorithm using alibrary of IP blocks containing hardware modules and their associatedsoftware drivers.|
Figure 1 above shows theoverall steps of the HW/SW partitioning and implementation of analgorithm using a library of IP blocks that contains both hardwaremodules and their associated software drivers.
On the hardware side, the algorithm modules are first synthesizedusing tools available from the IP library or FPGA supplier. These arethen synthesized with the required DSP or similar function processingblocks from a library provided by the FPGA supplier. The final step isto map these high-level constructs onto the basic FPGA structure toconfigure the FPGA in the CAP development board.
On the software side, the IP blocks required for the algorithm arecompiled, and then linked with Atmel's library of low-level devicedrivers that handles the detained operation of the multiple peripheralsand external interfaces of the CAP SoC. If required, this code is thenlinked to the OS, user interface and top-level control modules for theoperation of the entire system. The complete code set is loaded intothe program memory for the MCU core, which is the central architecturalelement of the CAP.
The basic architecture of the CAP development board is shown in Figure 2 below . The fixed portion ofthe device is in the CAP chip that is implemented as a standard MCUtogether with its on-chip memories, peripherals and interfaces, all ofwhich are brought out to the external connections shown.
|Figure2: Shown is the basic architecture of the CAP development board.|
A wide choice of memories can be connected to the external businterface. The hardware part of the algorithm under development ismapped into the FPGA via its configuration memory, and the software isloaded into the selected program memory (external or internal) of theMCU.
The development board configured emulates the operation of the finalCAP device at close to operational speed, including aspects such asmultitasking, inter-process communication and interrupts that arealmost impossible to simulate.
This emulation step enables the algorithm implementation to bethoroughly debugged under realistic conditions of use. It also enablesmetrics to be applied to determine whether the initial HW/SWpartitioning and the subsequent synthesis/compilation of the variousmodules is optimal. If improvements are required, these can beimplemented using the same design flow, as described previously, at noadditional cost other than that of extended development time.
Multiple iterations of the HW/SW partitioning and implementation ofthe HW/SW modules are possible in order to achieve an optimalimplementation.
MP, fabrication flow
Once the functionality of the device under development has been frozen,the final RTL code that has been used to program the FPGA is mapped (byAtmel or an accredited third-party design house) onto the metal layersthat personalize the CAP MP block. Rigorous post-layout simulationensures that the functionality of the metal-programmed CAP is identicalto that of the emulated version.
Prototypes are rapidly fabricated from blanks that have been stagedin the fab prior to metal layers. They enable the application developerto do a final verification of the device's HW/ SW functionality – inparticular, to check that the processing of the algorithm is optimal.
In the worst case, if the prototypes are not satisfactory, theadditional cost and time of a re-spin starting from the emulation phaseare reasonable, much lower than those for a full mask replacement for astandard-cell ASIC.
When the prototypes have been approved, volume fabrication of thepersonalized CAP devices commences, using the same flow as for theprototypes. Based on field feedback and in response to any upgrades ofthe data processing algorithm, subsequent incremental versions of theCAP-based device can be developed more rapidly and at lower cost thanthe initial version, basing the modifications on the final FPGAconfiguration of the development board before metal programming.