FPGA configuration using high-speed NOR flash

October 22, 2018

claz-October 22, 2018

NOR Flash memories are widely deployed as configuration devices for FPGAs.  FPGA usage in industrial, communications and automotive ADAS applications depends on the low latencies and high data throughput characteristics of NOR Flash.  A good example of a fast boot time requirement is the camera system in an automotive environment.  The speed at which the rear-view image appears on the dash board display upon ignition is a first-order design challenge.

Immediately after power-up, the FPGA loads the configuration bit stream that has been stored in the NOR device. When the transfer has completed, the FPGA transitions to an active (configured) state.  FPGAs include a number of configuration interface options that often include a parallel NOR bus and also a Serial Peripheral Interface (SPI) bus.  Memories supporting these busses have always had minor incompatibilities between offerings from different manufacturers that has made multiple sourcing of memory devices more difficult.

The newly released JEDEC xSPI specification was jointly developed by all the major NOR Flash memory manufacturers.  The new standard ends decades of NOR Flash manufacturers developing products independently without adhering to a common definition.  While minor differences still exist, the core JEDEC xSPI functionality is now identical in offerings from all manufacturers.  The JEDEC xSPI specification standardizes bus transactions, commands, and a wide swath of internal functionality.  Combined with high throughput, these next-generation Flash enable a whole new range of applications and capabilities. For example, the Semper NOR Flash family from Cypress conforms to the JEDEC xSPI specification and provides sustained a 400MB/s read transfer rate that is well suited for use as an FPGA configuration memory.  To put this in context, a 400MB/s data rate enables the contents of a 128MB (1Gb) device to be transferred in 320ms.

History of FPGA Configuration

When FPGAs first became available, the configuration memory of choice was either a parallel EPROM or parallel EEPROM product.  Over time, NOR Flash technology appeared and was widely adopted for its in-system reprogrammability and cost-effectiveness.  A second evolutionary transition has been that the SPI memory interface has displaced the parallel NOR interface in most applications.  Today’s SPI memory offerings offer high densities, small package sizes, high read throughputs and, perhaps most importantly, an efficient low pin count interface.

Figure 1 – The Gigabit Quad SPI (6 pin) and Parallel NOR (45 pin) interfaces (Source: Cypress Semiconductor)

Figure 1 shows the pinout of a one gigabit SPI device compared with a one gigabit Parallel NOR product.  For a one gigabit memory, the Quad Serial Peripheral Interface (QSPI) device has a six-pin interface and the Parallel NOR device requires 45 pins.  This dramatic difference in pin count has led to QSPI devices being widely adopted as the preferred configuration interface.  The QSPI interface allows for changes to densities without changing the device footprint.

FPGA Configuration Speed

As process nodes shrink, FPGA devices continue to increase the amount of programmable logic available. In turn, this leads to a requirement for higher density – and faster – configuration memory.  Modern FPGAs require as much as 128MB of data to be loaded during the configuration period.  These high-density configuration bit streams require a longer period to transfer from the NOR Flash device into the FPGA.  The configuration interface is not only optimized for read throughput but is also focused on facilitating interoperability between different NOR Flash manufacturers.

SPI Read Throughput

SPI read throughput has increased dramatically over the last several years, starting with the original SPI interface running in x1 mode all the way to modern QSPI offerings running x4 DDR.  As can be seen from Table 1, next-generation Flash devices are able to provide another increase in SPI bus performance.

click for larger image

Table 1 – SPI read throughput options for Flash memory devices. (Source: Cypress Semiconductor)

Modern SPI devices have the ability to be permanently configured for a fixed bus width and Transfer Type that is immediately operational upon power-up.  This permanent configuration must also be supported by the FPGA to allow the configuration process to begin immediately after power up.

Alternatively, SPI memories can exit power up in a x1 mode that allows the host system (FPGA) to query the memory for characteristics located in the Serial Flash Discoverable Parameters (SFDP) table.  This x1 mode has become a standard feature supported by multiple memory vendors and allows the FPGA to retrieve critical information regarding device functionality.  Once the device characteristics have been retrieved, the FPGA memory controller and SPI memory device can be quickly reconfigured for maximum read performance.

Figure 2 – The Serial Flash Discoverable Parameters (SFDP) table is used to configure SPI bus functionality upon power on. (Source: Cypress Semiconductor)

The retrieval of key device information using the integrated SFDP table will be critical when using next-generation Flash memory devices that can run with x1, x4 or x8 bus widths and also SDR or DDR transfer types.  The choice of which bus width and transfer type must align with the bus interface infrastructure implemented on the FPGA.

Dual QSPI Configuration Interface

To reduce FPGA configuration time, many modern FPGAs allow the configuration bit stream to be partitioned across two QSPI devices (Figure 3).  These two QSPI devices are wired in a parallel fashion where the lower nibble of the bit stream is stored in a “primary” QSPI device (QSPI_P) and the upper nibble of the bit stream is stored in the “secondary” QSPI device (QSPI_S).  These two devices are run in parallel while loading the bit stream, effectively doubling the read data transfer rate. 

Note that the interface is largely independent across both devices, with the exception of the shared SCK line.  The shared SCK line is implemented to minimize timing skews when reading the devices in a parallel (i.e., simultaneous) manner.  Accesses to the devices can occur one at a time or to both devices simultaneously when performing the same operation with an identical target address.

Figure 3 – The Dual QSPI Configuration interface (11 pins) allows the configuration bit stream to be partitioned across two QSPI devices to effectively double the read data transfer rate. (Source: Cypress Semiconductor)

This 11-pin dual QSPI configuration is attractive when large FPGA devices require large configuration (i.e., high density) configuration bit streams to be transferred in the fastest possible manner. 

Continue reading on page two, Flash Configuration >>



< Previous
Page 1 of 2
Next >

Loading comments...

Most Commented