Using customizable MCUs to bridge the gap between dedicated SoC ASSPs, ASICs and FPGAs: Part 2As discussed in Part 1 in this series, using a customizable microcontroller with a metal-programmable cell fabric allows designers to integrate their custom IP into a near-off-the-shelf solution. This approach makes it possible to achieve very high gate densities of between 170K and 210K gates/mm2. MPCF silicon efficiency is comparable to that of cell-based ASICs.
|Figure 1. D-type Flip-flop in 130 nm MPCF and 130 nm Standard Cell|
One such platform (Figure 2 below) integrates a 250K or 500K gate metal programmable (MP) block with 200 MHz ARM926EJ-S core 16 KBytes each of tightly coupled program and data cache for deterministic processing, 32 KBytes of additional SRAM, 32 KBytes of ROM and an array of peripherals and interfaces to handle networking, data transfers, and user interface requirements. These include a USB Host and Device, 10/100 Ethernet MAC, LCD controller, image sensor interface interfaces for CAN, MCI, and SPI.
The MP Block makes it possible
A key element in the implementation of metal programmable cell fabric in such designs lies in the architecture of the underlying MP block, large enough to implement a second ARM processor core, a digital signal processor (DSP), additional standard (or non-standard) interfaces and complex logic blocks such as GPS correlators.
|Figure 2. MCU-MP Block Diagram|
As show in Figure 2 above, the MP block has a number of internal features and dedicated external connections that enhance its efficiency for implementing application-specific logic elements. Internally, it has multiple distributed Single- and Dual-Port RAM blocks that can be tightly coupled to the logic elements that require them.
The MP Block is supplied by all the clocks originating from the Clock Generator and Power Management Controller. This gives the maximum flexibility in clocking the application-specific logic elements implemented in it.
Direct memory access (DMA) is implemented on all peripherals to handle transfers between the peripherals and the memories. Otherwise, transferring data between the peripherals and memories could overwhelm the ARM9 (Figure 3 below).
|Figure 3. PDC transfers|
For example, a 20 Mbps high speed SPI transfer would require all the ARM9's cycles. Simple DMA is implemented in every peripheral on the chip, and managed by a peripheral DMA controller that off loads data moving tasks, so a 20 Mbps SPI transfer can take place and still leave 88% of the ARM9's cycles for application processing. In addition, there is a 4-channel DMA controller to take care of the Ethernet MAC, LCD controller, and camera interface. (Figure 4, below)
A six-layer advanced high speed bus (AHB) matrix with six masters and six slaves completely eliminates bus contention. The six masters are the CPU data, CPU instruction, peripheral DMA controller, Ethernet and USB Host. The slaves are the memories, USB device, and the peripheral bus bridge. Any master can take control of any available bus when needed. Since there are as many busses as masters, there is never any bus contention.
|Figure 4. PDC CPU Availability|
The external connections of the MP Block include multiple parallel Master and Slave connections to the AHB Bus Matrix, a set of interrupt lines for peripherals implemented in the MP Block, a set of Peripheral Enable lines, two parallel sets of dedicated I/O ports and a multiplexed connection to the USB Device Transceiver. This enables a second USB Device to be implemented in the MP Block (Figure 5, below).
To accommodate GByte-plus external mass storage required for the 2-D graphics in a man-machine interfaces, the chip includes a SD/MMC memory card interface (MCI) and external bus interface (EBI) supporting SDRAM, NAND Flash with error code correction (ECC) and CompactFlash that supports True IDE mode interface to GByte-plus on-board or removable memory including USB sticks.
|Figure 5. Metal Programmable Block Diagram|
A fully integrated system controller manages interrupt handling, reset, startup/shutdown, timing, power management and parallel I/O control of the device. It also provides a debug interface to the ARM core via the Debug Unit.
The advanced interrupt controller (AIC) augments the ARM's two-level interrupt with an 8-level, vectored, prioritized interrupt system that transfers control to an interrupt handling routine in the minimum number of clock cycles, thereby improving the real-time operation of the device (Figure 6, below)
It also controls system startup and shutdown, provides multiple clock sources and peripheral enable lines so that each functional block can be run at the minimum clock frequency required to support the application, or put in idle mode if not required. This keeps the device power consumption to a minimum under all conditions of use. The fixed portion of the device is itself a system-on-chip.
|Figure 6. MCU-MP Block System Controller|
Connecting the Metal Programmable
The metal programmable block has a number of internal features and dedicated external connections that enhance its efficiency for implementing application-specific logic elements. Internally, it has multiple distributed Single- and Dual-Port RAM blocks that can be tightly coupled to the logic elements that require them. The external connections of the MP Block include:
1) Multiple parallel Master and Slave connections to the AHB Bus Matrix. Together with dedicated DMA channels they can be configured to create high-bandwidth data links to application-specific logic elements. If APB peripherals are required in the MP Block, an AHB/APB Bridge and Peripheral DMA Controller (PDC) can be built into it in order to provide the required interfaces.
2) A set of interrupt lines that enable application-specific logic elements. These generate interrupts that are handled by the Advanced Interrupt Controller.
3) A set of Peripheral Enable lines. These permit application-specific logic to enable or disable peripherals in the fixed portion of the device.
4) Two parallel sets of dedicated I/O ports. They provide a large number of external I/Os for the application-specific logic elements. A range of electrical characteristics is available for the I/Os connected to the MP Block.
5) A multiplexed connection to the USB Device Transceiver. This enables a second USB Device Port to be implemented in the MP Block. The MP Block is supplied by all the clocks originating from the Clock Generator and Power Management Controller. This gives the maximum flexibility in clocking the application-specific logic elements implemented in it.
Going with the (Design) Flow
The design flow of an MPCF-based configurable microcontroller is basically identical to that of a system with an off-the-shelf ARM9 MCU and a Xilinx or Altera FPGA. In fact, the MCU-plus-FPGA design may be manufactured in production volumes to test the market. Once the product's success is verified, the entire design can be migrated directly to the customizable microcontroller.
The FPGA register transfer level (RTL) netlist is migrated directly to the MP block, which already contains the AHB interfaces, DMA channels, and I/O channels.
Device drivers are supplied for all the peripherals/interfaces in the platform. These can also serve as templates for equivalent drivers for the peripherals/interfaces defined in the MP Block.
Industry-leading operating systems have already been ported onto the MPCF architecture. Integration of these software modules with the application code modules and the user interface for the application can be done in parallel with hardware development.
Emulation boards are also available for the implementation of the MPCF Platform on a single chip with an external FPGA for the MP Block. This enables the hardware and low-level software of the application-specific device to be emulated at close to operational speed, and errors to be corrected at no cost.
|Figure 7. MCU-MP Block Design Flow|
Stocks of pre-fabricated blanks MPCF"based configurable MCUs can be inventoried to ensure rapid prototyping and production volumes. Only the metal layers need to be added. placement and routing of the metal layers of MP Block are done using an established floorplan (Figure 7, above).
System Specification and
The starting point of the design flow is the specification of the required system, and the partitioning of its functionality between hardware and software, taking into account the architecture of the MPCF Platform and the possibilities for implementing the application-specific functionality in the MP Block. The general guideline is hardware for performance, software for flexibility, but in practice there is considerable variation of this partitioning.
One of the major benefits of the design flow of a customizable MCU is that the hardware/software partitioning can be validated and, if necessary, corrected at the emulation phase, before committing the hardware to silicon. This can save the time and expense of a silicon re-spin.
The task of customizing the MP Block is generally shared between the customer and a qualified third-party design house. The first phase is to develop application-specific hardware blocks and associated software drivers. In most cases the hardware blocks are coded in Verilog RTL and the software in C, C++ or ARM assembly language.
The task of integrating the application-specific blocks into the MP Block is facilitated by the placeholder instantiations of functional blocks already written into a template for the MP Block RTL code, supplied by the MCU vendor.
Separate templates are provided for AHB master/slave devices and for APB slaves. DMA or PDC connectivity is pre-programmed in some blocks. For example, the HDL an APB-connected function with PDC connectivity, would be as follows in Figure 8 below:
|Figure 8. RTL Code for Placeholder Instantiation of APB-connected Function in MP Block|
The RTL code for the MP Block is validated for compatibility with the fixed portion of the microcontroller. The RTL code is then synthesized using process-specific target libraries supplied by the vendor and functional simulations are performed on the entire device.
The developer can work with a third-party design house to integrate the software suite that corresponds to the hardware. As shown in Figure 9 below, the low-level device drivers for the platform are supplied by the MCU vendor, and those for the MP Block originate from the customer or third-party design house.
These are integrated with the application modules that program the MCU and peripherals/interfaces. If an operating system is required, a pre-ported version is obtained from a qualified third party and integrated into the software suite.
|Figure 9. MCU-MP Block Software|
The software suite is tested using industry-standard development tools. Optionally, hardware/software co-simulation may be carried out at this stage.
A key step in the design flow is the emulation of the hardware and at least the low-level software. The MPCF emulation board (Figure 10 below) includes a full complement of memories, standard interfaces and network connections together with additional connections that can be configured for the requirements of the application.
The customizable ARM9 platform is implemented as a single chip with a bonded-out FPGA interface in the MP Block. The high-density FPGA emulates the MP Block including embedded memories and external I/Os. An FPGA configuration memory contains the compiled HDL code for the MP Block.
|Figure 10. AT91CAP Emulation Board Architecture|
The EBI and the external connection from the FPGA are connected to a wide selection of memories on an extension board: SDRAM, Mobile DDRAM, Burst Cellular RAM, NOR Flash, NAND Flash, etc. These are loaded with the software suite and reference data for the application.
All standard Interfaces (CAN, USB, Ethernet, I2S, AC97, ADC, MCI, etc.) are routed through transceivers/phys/codecs to external connections. This enables full test/debug of the external interfaces and networking/communication links of the device.
All elements of the Graphical User Interface (GUI) are connected to on-board devices or interfaces: LCD, keyboard, touch screen interface, etc. This enables the basic elements of the GUI to be tested on-board.
External PIO and FPGA input/outputs are provided for connection to application-specific external devices, and the implementation of non-standard interfaces. The FPGA I/Os include a three-port USB device. A serial debug I/O connects to the PC that runs a set of industry-standard application development/debug tools.The SoC/FPGA emulation platform runs at close to the operating frequency of the final device. This enables at-speed testing of the device, both the MCU and standard interfaces in the platform and the functions implemented in the MP Block, together with all the software that has been developed up to this point.
At a minimum this includes the device drivers, operating system port
and the application code modules that control the functions implemented
in the MP Block. Corrections can be made to the hardware or software
elements of the device at no cost penalty.
The emulation steps are as follows:
Step #1: Connect the emulation board to a PC running industry-standard development/debug tools.
Step #2: Re-synthesize the MP Block RTL code including the application-specific modules for the FPGA.
Step #3: Program the FPGA using this synthesized MP Block code.
Step #4: Compile the software suite for the MCU and peripherals/interfaces as implemented on the emulation board.
Step #5: Load the application software and operating system onto an appropriate subset of the on-board memories.
Step #6: Run the application software on the emulation board.
Step #7: Debug and correct errors as required.
Experience indicates that the last emulation step almost always highlights errors in the hardware and/or software, or the hardware/software interface of the device.
The ability correct and re-test the complete design of the device at this stage is a major factor in reducing the design time and cost, and increases the probability of right-first-time silicon and software.
An additional benefit is that the emulated version of the final design can be used as the starting point for future design iterations, at a substantial saving of design effort.
The placement and routing step is carried out by a dedicated team at the vendor, using the established floorplan for the fixed portion of the device and the MP Block. Only the metal layers of the MP Block are placed and routed. A post-layout simulation ensures that no timing constraints have been violated.
Prototype fabrication is likewise limited to the metal layers, drawing on a stock of pre-fabricated base wafers. The fabrication cycle is much more rapid than that for an all-layer full-custom ASIC. Exhaustive pre- and post-packaging tests ensure that the fabricated devices conform to their simulated behavior.
One advantage of the MPCF approach is that the design team does not have to wait for a prototype to complete software development. Application software development and test can be carried out in parallel with the place and route as well as prototype fabrication.
Once the device and the software have been validated in the target application, the customer formally approves the product for volume fabrication, based on a rolling forecast. Since an inventory of blank wafers is kept on hand, volume production can be adjusted easily to market demand.
If the volume requirements for the device justify the investment, the netlist can be re-mapped onto a full standard cell design, bringing the advantages of reduced die size, enhanced performance at lower power consumption.
To read Part 1, go to "Metal programmable cell fabrics versus ASICs, SoCs and FPGAs - the tradeoffs."
Tim Kubitschek is the marketing manager the CAP customizable
microcontroller products at Atmel Corp.
He received his B.S. degree in Electrical Engineering at the University
of Colorado and his MBA from the St. Mary's University, San Antonio
Texas. He has worked for Advanced Micro Devices, NCR Microelectronics
and Symbios Logic.