Using a PCIe over Cabling-based platform to create hybrid FPGA/virtual platform prototypes -

Using a PCIe over Cabling-based platform to create hybrid FPGA/virtual platform prototypes

(Editor’s Note: In this Product How-To design article, Troy Scott of Synopsys describes how to use the PCIe-over-Cabling interface in its HAPS-60-based system to create a new class of hybrid prototypes. The prototypes make use of a transaction-level interface to a SystemC/TLM virtual platform that combines both hardware and software-based FPGA prototyping methods. )

FPGA-based prototypes deliver high value to a SoC development organization by providing multi-megahertz processing performance, real world I/O connectivity, and portability for distribution to software developers or field testing scenarios. The prototypes deliver operational systems running fast enough to make embedded software development and hardware/software validation feasible. Teams that have adopted FPGA-based prototyping realize months in shortened schedules and a more efficient and parallel hardware/software engineering methodology.

To fully realize the potential of these systems and maximize the return on investment from prototyping systems, development teams are taking advantage of advanced data exchange links beyond traditional JTAG. High-bandwidth physical links like PCI Express (PCIe) over Cabling allow the prototype to communicate with custom user applications for system control and monitoring. With a transaction-level interface to a SystemC/TLM virtual prototype, a new class of hybrid prototype is possible that leverages the strengths of both hardware and software-based prototyping methods.

Providing connectivity for a prototyping system
The conventional usage of the FPGA-based prototype has been as a stand-alone, isolated system. Any data links to a workstation have been relegated to programming the FPGA devices via JTAG, or in cases where an embedded CPU is included, a JTAG debug port allows the embedded software debug environment to communicate with the prototype for memory monitoring and source code debug. JTAG is an excellent vehicle for occasional data access, but it was not designed for high-bandwidth communication.

PCI Express expansion by using PCIe over Cabling products (Figure 1 ) has emerged as a de facto standard to provide higher bandwidth data exchange with FPGA-based prototypes. Most commercial FPGA-based prototyping systems provide some manner of PCIe access. For example, the UMRBus incorporated into the Synopsys HAPS 60 FPGA-based prototyping system provides the hardware infrastructure, OS device drivers, and various APIs for configuration and data exchange with a

Synopsys FPGA-based prototype
To maximize versatility and application, the UMRBus was designed as a multi-point interface. The plaform supports 27 independent interfaces per hardware motherboard and each interface provides 63 independently addressable interfaces per UMRBus chain. This deep hardware capacity allows the communication system to access various regions of the ASIC/SoC design and commit a communication channel for a particular application of the bus.

OS-specific APIs consist of functions for a host application to access client applications of the prototype. Depending on the application and bandwidth demand, the UMRBus provides data width options of 1-, 2-, 4-, 8-, 16-, and 32-bit implementations. An 800 Mbit/s performance is possible given a 100MHz global system clock and an 8-bit wide configuration.

Figure 1: Example physical connectivity for FPGA-based prototype, Synopsys UMRBus PCIe over Cable

Now that we have a high-performance, low-latency channel between the workstation and the prototype new hardware/software validation scenarios are possible and benefits can be realized with the communication link.

A rapid update process for boot ROM firmware
Consider a scheme for applying updates to the embedded boot ROM firmware of an FPGA-based prototype. In an embedded system, at the time of power on the CPU is uninitialized and system-specific configuration is required before proceeding to complex tasks like loading the embedded operating system. A piece of code is required at power-on that does the basic system setup before handing over the control to the boot loader usually present in external NOR/NAND Flash memory or to support the download tool for programming the Flash.

Typical boot ROM firmware functions include:

  • Processor initialization
  • Stack setup
  • Interrupt setup
  • Watchdog timer configuration
  • UART interface programming
  • SDRAM and Flash memory test
  • Boot media detection
  • Boot loader execution from Flash memory

The read-only nature of a boot ROM mandates a thorough debug, since it’s difficult to debug ROM-based code using traditional debug techniques like setting breakpoints and watching the system state.

Figure 2 illustrates an example implementation of an UMRBus client application interface module (CAPIM) to adjust the boot ROM contents of a SoC Design Under Test (DUT).

Figure 2: In-system boot ROM update via Synopsys UMRBus for HAPS

By leveraging the flexible hardware of the FPGA-based prototype and the CAPIM of UMRBus, FPGA prototyping engineers add a programming interface to the SoC DUT by connecting a CAPIM to the boot ROM, which, unlike the ASIC silicon, allows the FPGA devices to be reconfigured as flaws are detected. Firmware revisions can be easily made via a PCIe over Cabling connection between a PC and the FPGA-based prototype, and an application programming interface (API) to address locations within the SoC prototype. This allows speedy boot ROM updates.

The PC’s host application interface (HAPI) for UMRBus provides the interface for host applications written in C or C++ to access the CAPIMs of the prototype. Functions are provided for open, close, read, write, and interrupt actions. For example, the synopsis for function umrbus_write() is defined as:

   int umrbus_write (
      UMR_HANDLE handle,
      UMRBUS_LONG *data,
      UMRBUS_LONG size,
      char **errormessage
  ) ;

CAPIMs added to key status and control registers provide the visibility into the DUT not possible with a traditional debug environment. The idea of memory access and control via UMRBus can also be applied to accelerate the upload of software images to SDRAM by bypassing the slower JTAG interface. With 27 independent channels and support for multiple PC hosts, UMRBus and PCIe over Cabling connectivity increases the visibility and control over the prototype with minimal investment by the prototyping engineers.

On many prototyping and verifications teams, software specialists take an active role in the hardware/software validation activity by creating test pattern stimulus and monitoring interfaces to the ASIC/SoC prototype.

Software-driven steering and control of prototype
Software-driven steering and control over FPGA-based prototypes is a method to dramatically increase the depth and variety of validation scenarios available to the emulation/prototyping team. Software APIs with high-speed physical layer links make these new scenarios practical to implement even given the tight validation schedules teams face.

To illustrate a steering and control scenario using a host workstation (Figure 3) connected to an FPGA-based prototype, we implemented a JPEG baseline compression algorithm in the FPGAs of a HAPS system and a C-application for the image capture, compression access, and display.

The software application reads an uncompressed 24-bit BMP image from the host PC and transfers the image binary data to the FPGA internal Block RAM via the UMRBus Write (umrbus_write) function. After the JPEG core receives the start and control signals, the JPEG core compresses the image and writes the JPEG binary data into another FPGA internal block RAM.

As implemented in live hardware, the JPEG compression algorithm comprised of RGB2YCbCr color space transformation, RAM compression and quantitation tables for JPEG processing executes in just a few seconds of processing time. When compression is completed a UMRBus interrupt (umrbus_interrupt ) function is used to send the 'ready' signal to the software application, which reads the compressed image via UMRBus Read (umrbus_read) and saves the result to a file.

Figure 3: Example of software-driven steering and control of FPGA-based prototype, PC connectivity for image data load, and readback

To increase test coverage and qualitative review of media processing algorithms, tapping the compute resources of a host PC or workstation allows for more realistic hardware/software validation scenarios to be employed.

Robust APIs and low-latency physical connections to FPGA-based prototyping hardware have opened the door to co-simulation or co-emulation scenarios that link FPGA-based prototypes to SystemC based virtual prototypes.In-context IP validation
FPGA-based prototypes are considered'high-fidelity' because of the internal processing performance, on theorder of 50-75 MHz (typical), coupled with the ability to be immersed inreal-world environments with test equipment, radio interfaces, memoryICs, electro-mechanical transducers, etc.

This fidelity andability to run a cycle-accurate DUT at high performance is attractive asa companion to virtual prototype simulations incorporating embeddedCPUs running at very high MIPS performance while executing a firmwarestack, embedded OS, and an application software layer. By linking avirtual and FPGA-based prototype together, a new prototyping category iscreated, referred to as 'hybrid'.

The motivation for hybridprototypes is largely driven by engineering and program management as away to accelerate the availability of prototypes. This is accomplishedby mixing the SoC block prototypes (virtual or FPGA) that are mostreadily available to the development team.

Hybrid prototypes are employed for a variety of hardware/software validation scenarios:

  • Rapid bring-up by blending pre-existing IP from legacy ASIC/SoC projects or commercial IP with a virtual model or user application, which are typically far faster to develop
  • Use a virtual model for a complex processor subsystem while peripherals are allocated to FPGAs to provide superior MIPS throughput using loosely-timed models and the cycle accuracy and high fidelity of real world hardware I/Os
  • Start a new SoC entirely modeled in a virtual domain then incrementally substitute subsystems as RTL HDL becomes available

Considerthe case where a virtual model delivered by an IP provider of anembedded processor is employed for the latest CPU and subsystem. Thismodel is functionally equivalent to the release version, however RTL andphysical IP are not available to SoC developers (perhaps due toschedule or licensing restrictions), making a virtual platform the onlyfeasible way to start software development immediately.

Anotherimportant SoC block of this design is an HD video codec for which noSystemC transaction-level simulation model is available. While the codecmay not be tremendously important for initial software development,higher order functions of the software stack do address a new displaypanel and the quality team would like to assess refresh rates andlandscape-to-portrait orientation speed. The team decides to use the RTLfrom the previous codec design and implement into FPGAs to obtainaccuracy and performance.

The team responsible for the prototype identifies the following project goals:

  • Deliver an operational prototype in under one month
  • Achieve prototype performance fast enough to make software development feasible
  • Validate new software drivers with actual SoC IP
  • Attach to real world streaming I/O to stress test SoC IP using benchmarking tools

These objectives reflect the reality of prototype development projects and the realism and sophistication of validation tests.

Asa case study to measure the preparation effort and performancecharacteristics of a realistic system (Figure 4) partitioned between aSystemC model and FPGAs, we developed a hybrid prototype consisting of avirtual ARM Versatile Express uATX board utilizing ARM Cortex A9x2 CPUsimplemented as SystemC/TLM, and a USB 3.0 device mode controllerimplemented in a HAPS-60 series FPGA-based prototyping system from RTLsource.

To physically link the virtual model host workstation tothe HAPS hardware motherboard, the UMRBus and PCIe plug-in board areused to establish the physical link and a low-latency communicationlayer. A Synopsys Transactor Library for AMBA is provided with softwareAPI for SystemC/TLM and synthesizable Verilog HDL for implementationinto the Xilinx Virtex-6 FPGAs of the HAPS-60 series system. A HapsTrakdaughterboard with a USB PHY device completes the physical systemelements.

The virtual Versatile Express board provides asoftware stack composed of USB devices drivers, Linux OS and a filestorage application. Connections between the virtual and FPGA prototypedomains are via a single master and slave AXI 3 transactor for busconnections, and a GPIO transactor for interrupts.

We began thevirtual prototype project with the Versatile Express uATX boardreference design part of our Virtualizer Development Kit (VDK) Familyfor ARM Cortex processors. From within the Virtualizer platform editorinterface, the TLM 2.0 model of the USB 3.0 block was replaced by AMBAAXI transactors, a bus protocol-specific interface between the virtualand FPGA-based prototype.

Since the USB 3.0 had two interfaces -an initiator and a target – both an AXI master and a slave transactorwas inserted. Interrupt output from the USB 3.0 device controller wasreplaced by a general purpose I/O (GPIO) transactor. The platform’smemory map and transactor parameters were configured to reflect theprotocol details. The development required one day for platform assemblyand one day for troubleshooting the configuration.

TheFPGA-based prototype utilized a HAPS-62 system with an ASIC gatecapacity of approximately 18 million as a host for the USB 3.0 devicecontroller block; an HAPS daughter board with a USB PHY interface IC andphysical connectors; and a UMRBus interface kit to connect to the hostworkstation running the virtual prototype. The preparation effort forthe hardware flow required 3-5 days to assemble the platform and one dayfor troubleshooting the configuration. Given expert users of theVirtualizer and HAPS systems, the team was able to bring up the systemin less than 2 weeks.

Figure 4. SoC design partition between virtual and FPGA-based prototypes

Todemonstrate a real world validation scenario, the hybrid prototype isattached to a Windows 7 laptop with a USB 3.0 host port. The Windows OSauto-detects the new USB device and appears as a new volume to WindowsExplorer. USB3-Read/Write benchmarks using the DiskBench application on aWindows 7 PC showed that reads achieved 0.515 MByte/sec and writes of0.500 MByte/sec. Linux OS file mount time of the USB 3.0 mass storagedevice took about 1second. To illustrate the performance, a 60 MB MP4movie was played at real-time speed on the Windows laptop.

ASIC/SoCdevelopment teams are maximizing the return from the investment inprototyping systems by taking advantage of FPGA-based prototypes withfeatures for remote access from a host workstation. They are able toincrease the variety and realism of hardware/software validationscenarios with reliable and high-performance PCIe-over-Cablingconnectivity, APIs and logic for client/host communication, and busprotocol-specific transactors that allow the design team to segment theprototype between software and hardware-based hosts.

Dataexchange links allow prototypes to communicate with custom userapplications for system control and monitoring. Given atransaction-level interface, a new class of hybrid prototypes couldpotentially leverage the strengths of both FPGA and virtual prototypingabstractions.


1. FPGA-Based Prototyping Methodology Manual (FPMM), Best Practices in Design-For-Prototyping

2. Synopsys FPGA-Based Prototyping Solutions

3. Synopsys Virtual Prototyping Solutions

4. Synopsys Hybrid Prototyping Solutions

Troy Scott ,product marketing manager, is responsible for FPGA-based prototypingsoftware tools at Synopsys. He has 20 years of experience in the EDA andsemiconductor industry. Before joining Synopsys he was a productmanager at Lattice Semiconductor, where he worked to design and marketFPGA design tools. His background includes HDL synthesis and simulation,SoC prototyping, and IP evaluation and marketing. He holds a BSCE fromOregon Institute of Technology and a Graduate Certificate in ComputerArchitecture and Design from Portland State University.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.