Making hardware more like software -

Making hardware more like software

Here's a way to partially or fully reconfigure an FPGA without rebooting the operating system.

Click image to go to digital edition.

One of the biggest advantages of field programmable gate arrays (FPGAs) is the ability to change the functionality of the silicon by loading a new configuration file into the device. Controlling the configuration of the FPGA is usually done by an on-board processor that communicates to a flash-based configuration storage device.

The configuration mechanisms are usually custom to the specific FPGA and require specialized on-board connections and rules. Overall, the user usually embeds the flash device on-board forcing an estimate of the configuration size before storing all possible configuration streams of the FPGA on that device. In this article, we propose a device architecture and software method that alleviates this problem and also provides many advanced features to the processor.

Architecture description
Figure 1 shows an FPGA architecture with an embedded hardened processor. The processor has multiple hardened ports: Ethernet and DDR, while it communicates to the FPGA through a high-speed bus. In addition, the processor has an on-chip connection to the control (Config) block of the FPGA. The control block enables all configuration schemes for the FPGA and controls the startup sequences of the FPGA. The chip is divided into two regions such that all hardened peripherals connected to the FPGA including the processor can operate independently of the FPGA (the processor is live while the FPGA is configuring, reconfiguring, or in the user mode running the application).

Click on image to enlarge.

Additionally the FPGA is assumed to have two key features:

  • The ability to be configured (and reconfigured) without powering down the processor.
  • The ability to be partially reconfigured while part of the FPGA is running.

Typically this system is built as an embedded system with a two-chip solution. The benefits of a single-chip system over a dual-chip system are well known especially if the two chips can run independently. The processor usually runs an embedded operating system (such as VxWorks). Each of the hardened peripherals has its own device driver running on the embedded operating system. We propose two additional device drivers that will:

  • Control the port connecting the processor to the control block of the FPGA and control the function of full configuration of the FPGA.
  • Control the port connecting the processor to the control block and control the function of partial reconfiguration of the FPGA.

The purpose is to map the configuration of the FPGA into software application programming interfaces (APIs) on the processor. The actual bitstream for configuration could be stored remotely off-board if the processor has access to any of the hardened peripherals as needed. Figure 1 shows that the processor can access the data stream from any device connected on the Internet through the Ethernet port.

Use of FPGA
The FPGA can be fully or partially reconfigured while the processor is running as shown Figure 2 . Using full reconfiguration, the user can change the complete image of the FPGA and setup new regions to be used for subsequent partial reconfigurations.

Click on image to enlarge.

The FPGA can be used for many specific purposes in embedded systems. Two popular uses are hardware acceleration of compute-intensive algorithms shown in Figure 3 and processor peripheral expansion, shown in Figure 4 .

Click on image to enlarge.

Click on image to enlarge.

To maximize usability of the FPGA, partial reconfiguration is both of the above cases. For example, in the I/O peripheral case, as shown in Figure 4, a protocol can be switched from eSATA to Fibrechannel while the SDI channel is running. This can be achieved by partially reconfiguring the soft implementation of the protocol layer. In the co-processor example, the algorithm is being switched as well as being accelerated in the FPGA depending on the algorithm flow in the processor. In addition, if it is a multithread or multiprocess system, one thread can reconfigure a section while another thread is using another section of the FPGA for acceleration.

FPGA configuration (partial or full) can be mapped as device drivers in the embedded OS. An additional API can be written in software in the embedded OS to control the dynamic switching capability described. This isolates the hardware implementation. The advantages of this overall platform and the software implementation are:

  • The application developer is isolated from hardware implementation. The software code does not need to know that the FPGA is being reconfigured dynamically to expand the peripheral set. The hardware can be changed without changes to the user application code but with minor changes to the device drivers.
  • The configuration files for the FPGA can be stored off-board given maximum security. Special encryption techniques can be used within the processor to decrypt the configuration file if required.

One key point of the platform is preselecting the types of peripherals that can be placed on the FPGA or preselecting the algorithms being accelerated. This means that no dynamic compilation of Verilog/VHDL code is being done on the system. All FPGA images (full and partial) are assumed to be compiled before hand. Although the latency of loading and unloading configuration files into the FPGA is significant, the system assumes that it is done infrequently during system operation. The main purpose is two-fold:

  1. Saving area on the FPGA by using partial reconfiguration of known protocols or acceleration algorithms into predefined FPGA regions.
  2. Mapping the configuration function and the transformed regions into device drivers and API on the embedded OS.

Viewing the entire FPGA as dynamic hardware using a specialized OS that can select from a large library of hardware functions, which then can be loaded in a similar manner to software DLLs. 

One disadvantage of FPGA implementations is the relatively long compile time through the design software. Each of these implementations in the FPGA would have to be precompiled through the FPGA software. In addition, a system that can manage the complexity of the hardware design process for the architecture described is needed.Software flow
To take full advantage of the proposed architecture, the design software must have two critical components: a method to describe the joint software/hardware system, and the ability to incrementally recompile the FPGA regions for partial reconfiguration.

To break down the tools depending on the need for creating such a system, a tool that would allow the user to carve out different regions for partial reconfiguration and managing compile time is needed. The other is to have the ability to describe the entire system (SoC) on the device connecting the required peripherals to the processor. Having these two tools at their disposal, designers make the task relatively simple and are allowed to meet the demands of creating different personas or flavors for the device such that it can perform different functions as well as meet time to market demands.

Using partial reconfiguration on the device would have to follow a strict methodology, where the partial block and static blocks in the design are first identified and the user must adhere to the following guidelines:

  • Follow synthesis and hierarchy guidelines when designing the blocks.
  • Identify the partial blocks and assign them to a fixed location in the device floorplan using the incremental compilation methodology and floorplan region constraints.
  • Ensure the port definitions and hierarchy boundaries of all the partial blocks and their variants do not change, or the entire device may have to be reconfigured.
  • Ensure that reset signals between partial blocks and static blocks are not shared.
  • Add handshaking signals to deal with the availability and non-availability of the logic in the partial blocks, if there is logic within the static blocks that depends on the state of the partial block.

The proposed flow for the FPGA on is called the design partition-based flow: once a region is reserved the device and the region can be reconfigured with new logic while the remainder of the design is left running. Figure 5 shows a high-level description of a system designed for the proposed architecture.

Click on image to enlarge.

FPGA design portion
There are two top level FPGA images shown in Figure 5 . Image 1 “Top 1” has a static region composed of hierarchy modules A, C, E and two modules B and D. B has two different implementations calls B1 and B2. Each of those, call a persona of B. Similarly D has three personas. Image 2 “Top 2” has a static region composed of hierarchies G and H. F and I have two personas each.

Each FPGA image communicates with the processor through a well-defined bus architecture. Similarly all personas have a common interface to that static region. Each persona will map to a specific partial image loaded into the FPGA. For the high-level example shown, the compilation process will generate two full images of the FPGAs. Full image one will have five partial images (B1, B2, D1, D2, D3) while full image 2 will have four partial images (F1, F2, I1, I2).

To compile all the FPGA images efficiently, two concepts are needed by the FPGA design software: a design partition, and a locked FPGA region. A “design partition” is a hierarchy of the designs that are marked to be a partition. All changes to the design are limited within that partition for compilation purposes. A “locked FPGA region” is a physical portion of the FPGA that is reserved for a specific persona. A persona is assigned to a “locked FPGA region” by floorplanning the design. The FPGA design software compiles the static region and then successively compiles each persona incrementally to generate all full and partial images. Figure 6 shows a floorplan of the FPGA for the design Top 1 above. B and D are “locked FPGA regions.”

Click on image to enlarge.

Figure 7 shows the compilation flow for the Top 1 portion of the design described above. Essentially all personas can be compiled in parallel incrementally and across multiple machines.

Click on image to enlarge.

Such a flow allows the static regions to be locked down and focus on the partial region and the respective personas. Focusing on the smaller portion of the design allows the user to take advantage of the incremental nature of this tool and reduce compile time, preserve performance and also in the case of partial reconfiguration, since only a small area of the chip is being programmed, reduce configuration time.
Combined system and FPGA design portion

A tool that allows planning the proposed system along with the processor can connect the peripherals to the processors, map the address bus, control bus and data bus and create the system with a description of the system in HDL which will be then translated to the device in the place and route tool. This tool allows planning for the custom components or personas for each of the partial blocks. Figure 8 shows an example of a system with two top level implementations of the FPGA.

Click on image to enlarge.

One of the key requirements for this tool is to support features like design hierarchy, and incremental capabilities (not to be confused with the incremental flow described earlier). The primary reason to have hierarchy is to efficiently map the design on to the processor system and the FPGA portion. A tool that supports design hierarchy will be natural fit for such a device and work well with the FPGA design flow wherein the hierarchies will be marked as design partitions and assigned to a location on the FPGA die. The incremental nature of this tool will help in generation of the HDL based on changes. This allows the components that were generated again to be tracked while lowering the compile time for generating systems.

Users are always designing the complete system with considerations for the division between the implementation in soft logic (FPGA) or part of the hardened SoC. A tool that allows design of the entire system in one tool modelling simultaneously the hard and soft (FPGA) portion of the design is recommended. The FPGA portion and SoC portion should share the same memory map system, interrupts, and address space. Bus arbitration and bus alignment portions are automatically inserted by the system tool providing a single software development platform.

Figure 9 shows a tool flow chain. After completion through the system modelling tool, the software development for both the hard and soft processor using the software tool chain is done. The FPGA portion is then compiled through the system generator to generate the HDL for the bus and merge the HDL for the peripheral IP. The FPGA design software then compiles the HDL into multiple bit streams, one for each partial reconfiguration of the system.

Click on image to enlarge.

Once the HDL is generated it is then taken into the place and route tool for compilation. Since the system design flow has incremental capibilities to generate HDL, personas of Top 1 and Top 2 can always be created and new flavors of the system generated. Within Top 1 and Top 2, partial personas as shown in Figure 9: (Periph D1, D2, Periph E1, E2) can be created. The place and route tool will detect that either Top 1 or Top 2 have changed and compile the changed portion and generate a partial programming file. Similarly it can detect changes for D1, D2 and E1, E2.

The following are necessary steps for generating a programming file:

A.    To generate the base programming file:

  1. Connect the system within the system tool.
  2. Generate HDL for the system tool.
  3. Instantiate the top level HDL for the entire design in the FPGA tool.
  4. Designate partitions to the partial blocks and assign them to the location on the FPGA.
  5. Compile the entire design. This is the base configuration of the FPGA.
  6. Generate the programming file for the base configuration.
  7. Program the device.

B.    To generate the partial images

  1. Regenerate the new persona for Top 1 or Top 2 in the system tool (and for D1, D2, E1, E2)
  2. Generate the HDL for the new components.
  3. Recompile the design within the place and route tool. Since these were originally designated as partitions. This will happen incrementally.
  4. Rest of the system uses the last compilation of the design.
  5. Generate the partial programming files for the personas.
  6. Program the device with the partial reconfiguration file.

Summary of benefits

A single-chip architecture that combines the benefits of hard processors and FPGAs has been presented thus far. Although presented as a single-chip solution, it's important to note that the FPGA and the processor mimic some of the advantages of a multichip solution. In addition, the ability to reconfigure the FPGA through the processor partially and fully provides additional benefits especially when combined with a software device driver solution and a design tool chain that can model the entire system. Some benefits are:

  • Partial reconfiguration saves area for peripherals (or cores) that do not operate simultaneously.
  • FPGA bitstreams can be loaded remotely saving local flash space and providing security since no FPGA information is found on-board.
  • A lower power core can be loaded into the FPGA for different operating modes without rebooting the embedded OS.
  • A debug core can be added to FPGA to monitor traffic while other peripherals are running.
  • Different acceleration algorithms can be added to the FPGA, if used as a coprocessor dynamically.
  • System is upgradable, since all FPGA operation modelled in terms of software device drivers including configuration.
  • Single tool-chain to manage the entire system.

Chain of tools
A hardware architecture and a software tool chain that combines the benefits of a hard-processor and an FPGA in an embedded system has been presented. Many embedded systems today use multiple chips in the same scenario. The addition of reconfiguration (full and partial) to the FPGA directly from the processor without the OS rebooting and without suspending the processor offers added versatility to the system. In addition, a single tool chain using a system modelling tool that simplifies the design of the complex embedded system has been proposed. 

Mario Khalaf is a principal investigator, Office of the CTO at Altera Corporation. Mario has over 15 years of experience in the FPGA industry. Mario is also one of the original software architects of the Quartus II system.

Ajay Jagtiani is a software technical marketing manager at Altera. He holds an MBA in financial analysis from the University of San Francisco, an MSEE from Stevens Institute of Technology and a BSEE from Bombay University.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.