Using customizable MCUs to bridge the gap between dedicated SoC ASSPs, ASICs and FPGAs: Part 2 - Embedded.com

Using customizable MCUs to bridge the gap between dedicated SoC ASSPs, ASICs and FPGAs: Part 2

As discussed in Part 1 in this series, usinga customizable microcontroller with a metal-programmable cell fabricallows designers to integrate their custom IP into a near-off-the-shelfsolution. This approach makes it possible to achieve very high gatedensitiesof between 170K and 210K gates/mm2. MPCF silicon efficiency iscomparable to that of cell-based ASICs.

For example, an MPCF cell implementing a Dflip-flop (DFF) versus astandard cellDFF both in a 130 nm process consumes nearly theidentical area (See Figure 1, below ).

Figure1. D-type Flip-flop in 130 nm MPCF and 130 nm Standard Cell

One such platform (Figure 2 below )integrates a 250K or 500K gate metal programmable (MP) block with 200MHz ARM926EJ-S core 16 KBytes each oftightly coupled program and datacache for deterministic processing, 32 KBytes of additional SRAM, 32KBytes of ROM and an array of peripherals and interfaces to handlenetworking, data transfers, and user interface requirements. Theseinclude a USB Host and Device, 10/100 Ethernet MAC, LCD controller,image sensor interface interfaces for CAN, MCI, and SPI.

The MP Block makes it possible
A key element in the implementation of metal programmable cell fabricin such designs lies in the architecture of the underlying MP block,large enough to implement a second ARM processor core,a digitalsignalprocessor (DSP), additional standard (or non-standard) interfacesandcomplex logic blocks such as GPS correlators.

Figure2. MCU-MP Block Diagram

As show in Figure 2 above ,the MP block has a number of internal features and dedicated externalconnections that enhance its efficiency for implementingapplication-specific logic elements. Internally, it has multipledistributed Single- and Dual-Port RAM blocks that can be tightlycoupled to the logic elements that require them.

The MP Block is supplied by all the clocks originating from theClock Generator and Power Management Controller. This gives the maximumflexibility in clocking the application-specific logic elementsimplemented in it.

Directmemory access (DMA) is implemented on all peripherals to handletransfers betweenthe peripherals and the memories. Otherwise, transferring data betweenthe peripherals and memories could overwhelm the ARM9 (Figure 3 below ).

Figure3. PDC transfers

For example, a 20 Mbps high speed SPI transfer would require all theARM9's cycles. Simple DMA is implemented in every peripheral on thechip, and managed by a peripheral DMA controller that off loads datamoving tasks, so a 20 Mbps SPI transfer can take place and still leave88% of the ARM9's cycles for application processing. In addition, thereis a 4-channel DMA controller to take care of the Ethernet MAC, LCDcontroller, and camera interface. (Figure4, below )

A six-layer advanced high speed bus (AHB) matrix with six mastersand six slaves completely eliminates bus contention. The six mastersare the CPU data, CPU instruction, peripheral DMA controller, Ethernetand USB Host. The slaves are the memories, USB device, and theperipheral bus bridge. Any master can take control of any available buswhen needed. Since there are as many busses as masters, there is neverany bus contention.

Figure4. PDC CPU Availability

The external connections of the MP Block include multiple parallelMaster and Slave connections to the AHB Bus Matrix, a set of interruptlines for peripherals implemented in the MP Block, a set of PeripheralEnable lines, two parallel sets of dedicated I/O ports and amultiplexed connection to the USB Device Transceiver. This enables asecond USB Device to be implemented in the MP Block (Figure 5, below ).

To accommodate GByte-plus external mass storage required for the 2-Dgraphics in a man-machine interfaces, the chip includes a SD/MMC memorycard interface (MCI) and external bus interface (EBI) supporting SDRAM,NAND Flash with error code correction (ECC) and CompactFlash thatsupports True IDE mode interface to GByte-plus on-board or removablememory including USB sticks.

Figure5. Metal Programmable Block Diagram

A fully integrated system controller manages interrupt handling,reset, startup/shutdown, timing, power management and parallel I/Ocontrol of the device. It also provides a debug interface to the ARMcore via the Debug Unit.

The advanced interrupt controller (AIC) augments the ARM's two-levelinterrupt with an 8-level, vectored, prioritized interrupt system thattransfers control to an interrupt handling routine in the minimumnumber of clock cycles, thereby improving the real-time operation ofthe device (Figure 6, below)

It also controls system startup and shutdown, provides multipleclock sources and peripheral enable lines so that each functional blockcan be run at the minimum clock frequency required to support theapplication, or put in idle mode if not required. This keeps the devicepower consumption to a minimum under all conditions of use. The fixedportion of the device is itself a system-on-chip.

Figure 6. MCU-MP Block SystemController

Connecting the Metal ProgrammableBlock
The metal programmable block has a number of internal features anddedicated external connections that enhance its efficiency forimplementing application-specific logic elements. Internally, it hasmultiple distributed Single- and Dual-Port RAM blocks that can betightly coupled to the logic elements that require them. The externalconnections of the MP Block include:

1) Multiple parallelMaster and Slave connections to the AHB Bus Matrix. Togetherwith dedicated DMA channels they can be configured to createhigh-bandwidth data links to application-specific logic elements. IfAPB peripherals are required in the MP Block, an AHB/APB Bridge andPeripheral DMA Controller (PDC) can be built into it in order toprovide the required interfaces.

2) A set of interruptlines that enable application-specific logic elements . Thesegenerate interrupts that are handled by the Advanced InterruptController.

3) A set of PeripheralEnable lines . These permit application-specific logic to enableor disable peripherals in the fixed portion of the device.

4) Two parallel sets of dedicated I/O ports. They provide a large number of external I/Os for theapplication-specific logic elements. A range of electricalcharacteristics is available for the I/Os connected to the MP Block.

5) A multiplexedconnection to the USB Device Transceiver. This enables a secondUSB Device Port to be implemented in the MP Block. The MP Block issupplied by all the clocks originating from the Clock Generator andPower Management Controller. This gives the maximum flexibility inclocking the application-specific logic elements implemented in it.

Going with the (Design) Flow
The design flow of an MPCF-based configurable microcontroller isbasically identical to that of a system with an off-the-shelf ARM9 MCUand a Xilinx or Altera FPGA. In fact, the MCU-plus-FPGA design may bemanufactured in production volumes to test the market. Once theproduct's success is verified, the entire design can be migrateddirectly to the customizable microcontroller.

The FPGA register transfer level (RTL) netlist is migrated directlyto the MP block, which already contains the AHB interfaces, DMAchannels, and I/O channels.

Device drivers are supplied for all the peripherals/interfaces inthe platform. These can also serve as templates for equivalent driversfor the peripherals/interfaces defined in the MP Block.

Industry-leading operating systems have already been ported onto theMPCF architecture. Integration of these software modules with theapplication code modules and the user interface for the application canbe done in parallel with hardware development.

Emulation boards are also available for the implementation of theMPCF Platform on a single chip with an external FPGA for the MP Block.This enables the hardware and low-level software of theapplication-specific device to be emulated at close to operationalspeed, and errors to be corrected at no cost.

Figure7. MCU-MP Block Design Flow

Stocks of pre-fabricated blanks MPCF”based configurable MCUs can beinventoried to ensure rapid prototyping and production volumes. Onlythe metal layers need to be added. placement and routing of the metallayers of MP Block are done using an established floorplan (Figure 7, above ).

System Specification andPartitioning
The starting point of the design flow is the specification of therequired system, and the partitioning of its functionality betweenhardware and software, taking into account the architecture of the MPCFPlatform and the possibilities for implementing theapplication-specific functionality in the MP Block. The generalguideline is hardware for performance, software for flexibility, but inpractice there is considerable variation of this partitioning.

One of the major benefits of the design flow of a customizable MCUis that the hardware/software partitioning can be validated and, ifnecessary, corrected at the emulation phase, before committing thehardware to silicon. This can save the time and expense of a siliconre-spin.

The task of customizing the MP Block is generally shared between thecustomer and a qualified third-party design house. The first phase isto develop application-specific hardware blocks and associated softwaredrivers. In most cases the hardware blocks are coded in Verilog RTL andthe software in C, C++ or ARM assembly language.

The task of integrating the application-specific blocks into the MPBlock is facilitated by the placeholder instantiations of functionalblocks already written into a template for the MP Block RTL code,supplied by the MCU vendor.

Separate templates are provided for AHB master/slave devices and forAPB slaves. DMA or PDC connectivity is pre-programmed in some blocks.For example, the HDL an APB-connected function with PDC connectivity,would be as follows in Figure 8 below:

Figure8. RTL Code for Placeholder Instantiation of APB-connected Function inMP Block

The RTL code for the MP Block is validated for compatibility withthe fixed portion of the microcontroller. The RTL code is thensynthesized using process-specific target libraries supplied by thevendor and functional simulations are performed on the entire device.

The developer can work with a third-party design house to integratethe software suite that corresponds to the hardware. As shown in Figure 9 below, the low-leveldevice drivers for the platform are supplied by the MCU vendor, andthose for the MP Block originate from the customer or third-partydesign house.

These are integrated with the application modules that program theMCU and peripherals/interfaces. If an operating system is required, apre-ported version is obtained from a qualified third party andintegrated into the software suite.

Figure9. MCU-MP Block Software

The software suite is tested using industry-standard developmenttools. Optionally, hardware/software co-simulation may be carried outat this stage.

Hardware/Low-level Softwareemulation
A key step in the design flow is the emulation of the hardware and atleast the low-level software. The MPCF emulation board (Figure 10 below ) includes a fullcomplement of memories, standard interfaces and network connectionstogether with additional connections that can be configured for therequirements of the application.

The customizable ARM9 platform is implemented as a single chip witha bonded-out FPGA interface in the MP Block. The high-density FPGAemulates the MP Block including embedded memories and external I/Os. AnFPGA configuration memory contains the compiled HDL code for the MPBlock.

Figure10. AT91CAP Emulation Board Architecture

The EBI and the external connection from the FPGA are connected to awide selection of memories on an extension board: SDRAM, Mobile DDRAM,Burst Cellular RAM, NOR Flash, NAND Flash, etc. These are loaded withthe software suite and reference data for the application.

All standard Interfaces (CAN, USB, Ethernet, I2S, AC97, ADC, MCI,etc.) are routed through transceivers/phys/codecs to externalconnections. This enables full test/debug of the external interfacesand networking/communication links of the device.

All elements of the Graphical User Interface (GUI) are connected toon-board devices or interfaces: LCD, keyboard, touch screen interface,etc. This enables the basic elements of the GUI to be tested on-board.

External PIO and FPGA input/outputs are provided for connection toapplication-specific external devices, and the implementation ofnon-standard interfaces. The FPGA I/Os include a three-port USB device.A serial debug I/O connects to the PC that runs a set ofindustry-standard application development/debug tools.

The SoC/FPGA emulation platform runsat close to the operating frequency of the finaldevice. This enables at-speed testing of the device, both the MCU andstandard interfaces in the platform and the functions implemented inthe MP Block, together with all the software that has been developed upto this point.

At a minimum this includes the device drivers, operating system portand the application code modules that control the functions implementedin the MP Block. Corrections can be made to the hardware or softwareelements of the device at no cost penalty.

Software emulation steps
The emulation steps are as follows:

Step #1: Connect theemulation board to a PC running industry-standard development/debugtools.

Step #2: Re-synthesize the MP Block RTL code including the application-specificmodules for the FPGA.

Step #3: Program the FPGA using this synthesized MP Block code.

Step #4: Compile thesoftware suite for the MCU and peripherals/interfaces as implemented onthe emulation board.

Step #5: Load the application software and operating system onto an appropriatesubset of the on-board memories.

Step #6: Runthe application software on the emulation board.

Step #7: Debugand correct errors as required.

Experience indicates that the last emulation step almost alwayshighlights errors in the hardware and/or software, or thehardware/software interface of the device.

The ability correct and re-test the complete design of the device atthis stage is a major factor in reducing the design time and cost, andincreases the probability of right-first-time silicon and software.

An additional benefit is that the emulated version of the finaldesign can be used as the starting point for future design iterations,at a substantial saving of design effort.

The placement and routing step is carried out by a dedicated team atthe vendor, using the established floorplan for the fixed portion ofthe device and the MP Block. Only the metal layers of the MP Block areplaced and routed. A post-layout simulation ensures that no timingconstraints have been violated.

Prototype fabrication is likewise limited to the metal layers,drawing on a stock of pre-fabricated base wafers. The fabrication cycleis much more rapid than that for an all-layer full-custom ASIC.Exhaustive pre- and post-packaging tests ensure that the fabricateddevices conform to their simulated behavior.

Conclusion
One advantage of the MPCF approach is that the design team does nothave to wait for a prototype to complete software development.Application software development and test can be carried out inparallel with the place and route as well as prototype fabrication.

Once the device and the software have been validated in the targetapplication, the customer formally approves the product for volumefabrication, based on a rolling forecast. Since an inventory of blankwafers is kept on hand, volume production can be adjusted easily tomarket demand.

If the volume requirements for the device justify the investment,the netlist can be re-mapped onto a full standard cell design, bringingthe advantages of reduced die size, enhanced performance at lower powerconsumption.

To read Part 1, go to “Metalprogrammable cell fabrics versus ASICs,SoCs and FPGAs – the tradeoffs.

Tim Kubitschek is the marketing manager the CAP customizablemicrocontroller products at Atmel Corp.He received his B.S. degree in Electrical Engineering at the Universityof Colorado and his MBA from the St. Mary's University, San AntonioTexas. He has worked for Advanced Micro Devices, NCR Microelectronicsand Symbios Logic .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.