The MCU guy's introduction to FPGAs: Configuration Techniques & Technologies -

The MCU guy’s introduction to FPGAs: Configuration Techniques & Technologies


This is the third column in our quest to introduce MCU guys and gals to the wonderful world of FPGAs (see also The MCU guy's introduction to FPGAs: The Hardware and The MCU guy's introduction to FPGAs: The Software).

As usual, before we plunge headfirst into the fray with gusto and abandon, it's probably a good idea to remind ourselves of some fundamental factoids so as to ensure that we're all tap-dancing to the same drum beat with regard to basic FPGA concepts (and I know whereof I speak, because my dear old dad used to be a tap-dancer on the variety hall stage as a young man).

As we discussed in the “Hardware” column, the primary programmable fabric inside an FPGA comprises “islands” of programmable logic blocks basking in a “sea” of programmable interconnect. The device will also include general-purpose input/output (GPIO) pins and pads, which are not shown in the illustration below.

The FPGA also contains a bunch of configuration cells — millions of the little rascals in the case of the larger devices. These configuration cells are used to perform a variety of tasks. Some will be used to define the contents of the lookup tables (LUTs). Others will be used to make or break connections between different sections of interconnect, thereby connecting the various entities within the device, including connecting the primary general-purpose input/output (GPIO) pins to the internal functional elements. Furthermore, the GPIO pins can be configured to support a wide variety of I/O standards, including voltages, termination impedances, slew rates, and so forth.

Alternative configuration cell technologies
There are three main technologies that are commonly used to implement the configuration cells inside an FPGA: antifuse, Flash, and SRAM-based.

Let's start with antifuses, which have historically been of interest for high-radiation environments like aerospace applications. Antifuse-based devices are programmed off-line using a special device programmer. These devices are non-volatile (their configuration data remains when the system is powered down) and their function is immediately available to the system when it first powers up. However, this is a one-time programmable (OTP) technology, which means that once you've programmed a device there's no going back.

Flash-based FPGAs may be programmed off-line or while resident on the circuit board. Like antifuse -based devices, the Flash configuration cells are non-volatile; unlike antifuse-based devices, the Flash configuration cells are multi-time programmable (MTP) and can be re-programmed with a new configuration if required. In some cases, the Flash is large enough to hold two or more configurations. This means that the device can be running using one configuration while a second configuration is loaded into another area of the Flash. Once the new configuration has been successfully loaded and verified, the device can be instructed to switch over. This is very useful with regard to tasks like performing secure remote upgrades.

Both antifuse and Flash-based FPGAs have the advantage of low power consumption. However, both technologies require additional processing steps on top of the basic CMOS process used to create silicon chips, resulting in the fact that they are typically one or two generations behind the leading-edge fabrication technology.

In the case of SRAM-based FPGAs, each configuration bit has an associated SRAM cell. One advantage is that these devices can be created using the latest-and-greatest CMOS technology without requiring any additional process steps. One disadvantage is that they are volatile, which means their configuration is lost whenever power is removed from the system. This also means that SRAM-based FPGAs have to have their configuration reloaded every time the system is powered up.

Historically, SRAM-based FPGAs were not considered to be suitable for high-radiation environments. More recently, however, new techniques have come into play, like using triple modular redundancy (TMR), which involves replicating the design three times inside the device. When combined with having the FPGA constantly perform CRC checks on its configuration bits and reloading any portion of the device that becomes corrupted, this approach has proved to be so successful that multiple SRAM-based FPGAs are on the Curiosity Rover, which is currently trundling around Mars (the rover also contains a bunch of antifuse-based FPGAs).

Last, but not least, some FPGAs use a hybrid approach that involves a mix of Flash and SRAM configuration cells. On power-up, the contents of the Flash are copied over into the SRAM-based configuration cells in a massively parallel fashion. Later, a new configuration can be loaded into the Flash while the FPGA keeps running using the old configuration stored in its SRAM.

For the remainder of this column, we will focus on SRAM-based FPGAs, because (a) these account for the vast majority of devices and (b) they offer some very interesting capabilities like dynamic partial reconfiguration. Furthermore, we will use Xilinx FPGAs as the basis for our discussions (Hey, I had to pick someone LOL).


SRAM-based FPGA configuration modes
SRAM-based FPGAs come equipped with a small group of “configuration mode” pins. As illustrated in the following image, these pins are typically hard-wired to logic 0 and 1 values, which are used to inform the FPGA which configuration mode it is to use.

Basic configuration modes.

The simplest technique is to perform a serial load with the FPGA as the “master” device. In this case, the configuration file is typically stored in an external serial Flash memory device. When the board is powered up, the FPGA initiates and controls the loading of the configuration bitstream from the Flash memory.

Serial load with the FPGA as the master.

Observe the “configuration data out” signal coming out of the FPGA. One use for this signal is to read the configuration data back out of the device. Another possibility is for multiple FPGAs to be daisy-chained together and to share a single external configuration memory device as illustrated below.

Serial load with daisy-chained FPGAs.

The advantages of the serial load with the FPGA as the master device mode are that the Flash memory device is inexpensive, this mode uses very few pins on the FPGA, and it requires very few tracks on the circuit board. The disadvantage is that it's a relatively slow technique in the scheme of things.

As opposed to using a serial load with the FPGA as the master device mode, we can use a parallel version. Using a multi-bit bus dramatically speeds the configuration process, but it does consume more pins on the FPGA and it requires more tracks on the circuit board.

Thus far we've only considered the FPGA to be the “master” — it's also possible for it to act as the “slave.” Consider a circuit board containing both a microcontroller and an FPGA, for example. In this case, the designers may decide to use the microprocessor to load the configuration bitstream into the FPGA. This scenario conveys a number of advantages, not the least being that the microcontroller might be used to query the environment in which the system resides, and to then select alternative configuration bitstreams to be loaded into the FPGA accordingly.

Note that, in the case of an SoC FPGA like a Zynq device from Xilinx that contains a hard multi-core microcontroller subsystem, this processor normally controls the loading of the configuration bitstream that is used to program the traditional FPGA fabric.

The last of the traditional configuration options is to use the FPGA's JTAG port. One advantage of this approach is that it doesn’t consume any of the FPGA's general-purpose input/output (GPIO) pins; one disadvantage is that it's not the fastest way to load an FPGA. As usual, I could waffle on about different aspects of the JTAG approach for ages, but that's probably a topic for another day.

Introducing the Configuration Engine
One more topic I want to mention here is that of the Configuration Engine. This is a small function that is implemented as a hard core on the FPGA as illustrated below.

The Configuration Engine is a hard core function in the FPGA.

Irrespective of the configuration mode used (serial, parallel, FPGA as master or slave, JTAG), the configuration bitstream ends up being fed into the Configuration Engine, which is in charge of loading the configuration data into the FPGA's configuration cells.


Dynamic partial reconfiguration
We often talk about how one of the major advantages of SRAM-based FPGAs is the fact that they can be configured and reconfigured to perform whatever tasks we wish, unlike fixed-function devices like ASICs/ASSPs/SoCs, whose algorithms are effectively “frozen in silicon.” Thus far, however, we've really not explored just how powerful SRAM-based reconfiguration technology can be. As usual, of course, there's more to this than initially meets the eye, and it's easy to become confused, so I hope you won't mind if I take this step-by-step as follows…

Full-chip configuration
Let's start right at the beginning. The typical design flow is that we use some technique to capture our design intent. We then simulate the design to check that it does what we want it to do. Next, we run the design through logic synthesis, then place-and-route, and eventually we end up with a configuration file. Later, when we power up the circuit board, we load the configuration bitstream from the configuration file into the FPGA.

For the moment, let's assume that our FPGA does not contain a hard core processor. In this case, as we discussed earlier, there are a number of ways in which we can load the configuration bitstream into the FPGA. Furthermore, as you may recall, the configuration engine is a small function that is implemented as a hard core on the FPGA as illustrated below.

Irrespective of the configuration mode used (serial or parallel, FPGA as master or slave, or JTAG), the configuration bitstream ends up being fed into the configuration engine, which is in charge of loading the configuration data into the FPGA's configuration cells. In particular, observe that — in the case of full-chip configuration — the FPGA can be instructed to act as the “master” or the “slave” (the significance of this point will become apparent shortly).

Full-chip re configuration
In many cases, once we've powered up the board and loaded a configuration into the FPGA, that's all we have to do — we simply leave the FPGA running and “doing its thing.” In some cases, however, we might decide to re-load the FPGA with a completely different configuration. As one example of this, upon power up we might first load the FPGA with a configuration that performs self-test and perhaps some amount of board-level test. Once we are satisfied that everything is as it should be, we can reload the FPGA with a completely different configuration.

To a large extent, we can treat this type of situation as comprising a number of completely separate designs that are developed and verified in isolation — it just so happens that these multiple designs end up running in the same physical device. The point I'm making here is that full-chip reconfiguration does not really require any extra features or capabilities in the design tools.

Static partial reconfiguration
To be fair, I really have to note that the term “static partial reconfiguration” is not in common usage — in fact, to be completely honest, it's pretty much a term I made up just a few moments ago. What I'm trying to convey here is that it is possible to place the FPGA into a reset state, hold it there (this would be the “static” part), swap out a portion of the design, and then allow the FPGA to continue on its way.

Dynamic partial reconfiguration
This is where things start to become very exciting. The idea is that we leave the FPGA running while we are performing the partial reconfiguration. We don't clear the device or put it into a reset mode, and we don't disrupt any of its contents (apart from the portion we're reconfiguring, of course).

However, this does mean that that it's up to the designer to ensure that we stop using the portion of the FPGA that's being reconfigured while it's being reconfigured. We can visualize this as being the FPGA equivalent of a “hot swap” capability.

Partial reconfiguration using traditional techniques
Now, I'm not saying that this is a good idea, but it is certainly possible to perform partial reconfiguration using traditional configuration techniques, as illustrated in the image below. The important point to note here is that — in the case of partial reconfiguration — the FPGA can act only as a “slave” (not as a “master”). The reason for this is that when undergoing configuration in in its “master” mode, the FPGA cannot stop itself from performing certain operations that would not be friendly to a partial reconfiguration scenario.

Thus far, we haven't really talked about the inner workings of a configuration file, and we certainly don't want to make things overly complicated here. Suffice it to say that — in addition to the configuration bitstream itself — a configuration file contains a number of things, including some header information, some instructions that do things like clearing and resetting the device, and a start address. In the case of a standard configuration file that is used for full-chip configuration and/or reconfiguration, we can think of the start address as being zero accompanied by a single large configuration bitstream sufficient to load the entire device.

By comparison, in the case of a partial reconfiguration, we can think of the start address as being some value 'n' accompanied by a small configuration bitstream that targets only a portion of the device.

The Internal Configuration Access Port (ICAP)
As we noted earlier, dynamic partial reconfiguration is where things start to become very exciting. Of course, one thing about the term “dynamic” is that it implies high speed. If were to use the traditional configuration techniques to perform partial configuration, then they would typically not offer the levels of performance that we require. This leads us to the Internal Configuration Access Port (ICAP), which is the predominant mechanism used to implement partial configuration.

The ICAP is a hard core that is present in all Xilinx FPGAs and SoC FPGAs. The traditional configuration techniques are external to the FPGA. By comparison, the ICAP — which is instantiated as a component as part of the design's RTL — provides a gateway to the configuration that is accessible from inside the device.

It's up to the designer to decide how the ICAP is to be hooked up to the rest of the system and how it is to be controlled. For example, the designer may choose to connect the ICAP to a PCIe interface or to an on-chip DDR memory controller block, or… the possibilities are endless. Also, the designer may decide to create a small state machine to act as the controller. Alternatively, the designer may opt to use a soft core processor, such as the 32-bit MicroBlaze.

Consider the (very simplistic) example shown below, in which we have a “data processing” module in the design. It may be that different architectural implementations of this module work better for different types of data.

Thus, a typical scenario is to load the FPGA with an initial full-chip configuration, set it running, and employ the user's control logic to monitor what's going on (this is the dashed line in the image above). When certain conditions are met, the control logic may decide to pause any data processing activities and to perform a dynamic partial reconfiguration to load a new variant of the data processing module.

It's important to note that the ICAP can be used to monitor internal activity, such as the state of the load. This allows the user's control logic to determine when the partial reconfiguration has been completed, at which time it can hand control back over to the rest of the system.

Additional points to ponder
In the simplistic example presented above, we showed only a single function — our “data processing” module — that could be dynamically reconfigured. In reality, we could create a large number of such modules. Having said this, one constraint is that it is possible to be reconfiguring only one module at any particular time.

Another consideration is that it may be necessary to always have some form of data processing occurring, even if that processing does not provide the optimal solution in terms of bandwidth or latency or… whatever. In this case, one option would be to create two dynamically reconfigurable modules and to multiplex between them. This way, one module could be performing the data processing while the other module was being dynamically reconfigured.

Here's another possible scenario. It's very common to have some sort of data processing function that reads in data, does something to it, and stores the result in memory. It's also very common to have a chain of such functions as illustrated below.

In some cases, it may be applicable to have all of the stages present and operating all of the time in a pipeline-type implementation. In other applications, however, it is only necessary to be performing one data processing task at any particular time. In such a case, it would be possible to create the design using a single data processing module and a single block of RAM, as shown below.

The idea would be to use the first data processing function to perform some action on the data and store the result in the RAM. Then we would use partial dynamic reconfiguration to swap in a new data processing function. This new function would read the data out of the RAM, do whatever it has to do, and store the results back in the RAM. And so on and so forth…

One very important consideration is that today's design tools are not as geared up to supporting partial dynamic configuration as you might hope. In particular, it can be more than a tad tricky to simulate and verify the actual reconfiguration process itself.

Partial dynamic reconfiguration and the Zynq SoC FPGA
Finally, for this column, let's turn our attention to the Zynq SoC FPGA. As you will recall, this little beauty boasts a full hard core implementation of a dual ARM Cortex-A9 microcontroller subsystem augmented with a large quantity of traditional programmable fabric and a substantial number of general-purpose input/output (GPIO) pins.

When the circuit board is powered up, the Zynq's hard ARM Cortex-A9 microcontroller subsystem is automatically ready to rock-and-roll. Furthermore, in most usage scenarios, it is this processor subsystem that is in charge of loading the traditional FPGA fabric, and it performs this using its own dedicated PCAP (Processor Configuration Access Port) as illustrated below.

Now, the main ARM Cortex-A9 processor and its PCAP could also be used to perform dynamic partial reconfiguration. Alternatively, although I've not shown it here, there is also a standard ICAP as part of the programmable fabric. This means that we could use the ARM Cortex-A9 microcontroller subsystem and its PCAP to perform the initial load, and then use user control logic (maybe even a soft core processor) and the ICAP to perform any subsequent partial dynamic reconfiguration functions, while leaving the main processor free to perform other tasks.

Well, that's it for the moment. I know I've only scratched the surface of this subject, but this is quite a lot to wrap one's brain around. There are all sorts of related topics we can consider, such as the use of dynamic partial reconfiguration in creating radiation-tolerant designs, but that's a discussion for another day. In the meantime, do you have any questions with regard to the points presented here?


11 thoughts on “The MCU guy’s introduction to FPGAs: Configuration Techniques & Technologies

  1. “MaxnnAt the risk of subverting the train of discussion here, the Cypress PSoC1 has the ability to dynamically reconfigure and the functionality is built into the UI, PSoC Designer. I have used it quite effectively when I have needed additional resources

    Log in to Reply
  2. “Maxnn”When the circuit board is powered up, the Zynq's hard ARM Cortex-A9 microcontroller subsystem is automatically ready to rock-and-roll.”nn1. is there flash for the processor already on the Zynq?n2. I am not sure if I am reading this correctly-

    Log in to Reply
  3. “The Zynq is different to other FPGA in that it is a SoC and as such thee processor needs to be booted and configured from NVRAM this is the first thing that has to happen to get the system up and running. Once the processor has been configured it will con

    Log in to Reply
  4. “There is boot code on the Zynq to enable you to create a first stage bootloader and get the system up and running. The FSBL is created in Xilinx SDK along with your Board Support Package for your particular hardware configuration. It is very easy and simp

    Log in to Reply
  5. “I would expect that the Zynq is very much like the Altera SoCFPGA.nnThe SocFPGA is conceptually two separate devices: FPGA and HPS (hard cores) that are joined by a bridge.nnThe SoCFPGA HPS can be booted just like any other ARM CPU and can run without

    Log in to Reply
  6. “If the CPU is loading the FPGA bitstream then it can load from any source that the CPU can access. eg. SD card, network,… It does not have to be from nv memory.”

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.