FPGA configuration using high-speed NOR flash - Embedded.com

FPGA configuration using high-speed NOR flash

NOR Flash memories are widely deployed as configuration devices for FPGAs.  FPGA usage in industrial, communications and automotive ADAS applications depends on the low latencies and high data throughput characteristics of NOR Flash.  A good example of a fast boot time requirement is the camera system in an automotive environment.  The speed at which the rear-view image appears on the dash board display upon ignition is a first-order design challenge.

Immediately after power-up, the FPGA loads the configuration bit stream that has been stored in the NOR device. When the transfer has completed, the FPGA transitions to an active (configured) state.  FPGAs include a number of configuration interface options that often include a parallel NOR bus and also a Serial Peripheral Interface (SPI) bus.  Memories supporting these busses have always had minor incompatibilities between offerings from different manufacturers that has made multiple sourcing of memory devices more difficult.

The newly released JEDEC xSPI specification was jointly developed by all the major NOR Flash memory manufacturers.  The new standard ends decades of NOR Flash manufacturers developing products independently without adhering to a common definition.  While minor differences still exist, the core JEDEC xSPI functionality is now identical in offerings from all manufacturers.  The JEDEC xSPI specification standardizes bus transactions, commands, and a wide swath of internal functionality.  Combined with high throughput, these next-generation Flash enable a whole new range of applications and capabilities. For example, the Semper NOR Flash family from Cypress conforms to the JEDEC xSPI specification and provides sustained a 400MB/s read transfer rate that is well suited for use as an FPGA configuration memory.  To put this in context, a 400MB/s data rate enables the contents of a 128MB (1Gb) device to be transferred in 320ms.

History of FPGA Configuration

When FPGAs first became available, the configuration memory of choice was either a parallel EPROM or parallel EEPROM product.  Over time, NOR Flash technology appeared and was widely adopted for its in-system reprogrammability and cost-effectiveness.  A second evolutionary transition has been that the SPI memory interface has displaced the parallel NOR interface in most applications.  Today’s SPI memory offerings offer high densities, small package sizes, high read throughputs and, perhaps most importantly, an efficient low pin count interface.

Figure 1 – The Gigabit Quad SPI (6 pin) and Parallel NOR (45 pin) interfaces (Source: Cypress Semiconductor)

Figure 1 shows the pinout of a one gigabit SPI device compared with a one gigabit Parallel NOR product.  For a one gigabit memory, the Quad Serial Peripheral Interface (QSPI) device has a six-pin interface and the Parallel NOR device requires 45 pins.  This dramatic difference in pin count has led to QSPI devices being widely adopted as the preferred configuration interface.  The QSPI interface allows for changes to densities without changing the device footprint.

FPGA Configuration Speed

As process nodes shrink, FPGA devices continue to increase the amount of programmable logic available. In turn, this leads to a requirement for higher density – and faster – configuration memory.  Modern FPGAs require as much as 128MB of data to be loaded during the configuration period.  These high-density configuration bit streams require a longer period to transfer from the NOR Flash device into the FPGA.  The configuration interface is not only optimized for read throughput but is also focused on facilitating interoperability between different NOR Flash manufacturers.

SPI Read Throughput

SPI read throughput has increased dramatically over the last several years, starting with the original SPI interface running in x1 mode all the way to modern QSPI offerings running x4 DDR.  As can be seen from Table 1, next-generation Flash devices are able to provide another increase in SPI bus performance.

click for larger image

Table 1 – SPI read throughput options for Flash memory devices. (Source: Cypress Semiconductor)

Modern SPI devices have the ability to be permanently configured for a fixed bus width and Transfer Type that is immediately operational upon power-up.  This permanent configuration must also be supported by the FPGA to allow the configuration process to begin immediately after power up.

Alternatively, SPI memories can exit power up in a x1 mode that allows the host system (FPGA) to query the memory for characteristics located in the Serial Flash Discoverable Parameters (SFDP) table.  This x1 mode has become a standard feature supported by multiple memory vendors and allows the FPGA to retrieve critical information regarding device functionality.  Once the device characteristics have been retrieved, the FPGA memory controller and SPI memory device can be quickly reconfigured for maximum read performance.

Figure 2 – The Serial Flash Discoverable Parameters (SFDP) table is used to configure SPI bus functionality upon power on. (Source: Cypress Semiconductor)

The retrieval of key device information using the integrated SFDP table will be critical when using next-generation Flash memory devices that can run with x1, x4 or x8 bus widths and also SDR or DDR transfer types.  The choice of which bus width and transfer type must align with the bus interface infrastructure implemented on the FPGA.

Dual QSPI Configuration Interface

To reduce FPGA configuration time, many modern FPGAs allow the configuration bit stream to be partitioned across two QSPI devices (Figure 3).  These two QSPI devices are wired in a parallel fashion where the lower nibble of the bit stream is stored in a “primary” QSPI device (QSPI_P) and the upper nibble of the bit stream is stored in the “secondary” QSPI device (QSPI_S).  These two devices are run in parallel while loading the bit stream, effectively doubling the read data transfer rate. 

Note that the interface is largely independent across both devices, with the exception of the shared SCK line.  The shared SCK line is implemented to minimize timing skews when reading the devices in a parallel (i.e., simultaneous) manner.  Accesses to the devices can occur one at a time or to both devices simultaneously when performing the same operation with an identical target address.

Figure 3 – The Dual QSPI Configuration interface (11 pins) allows the configuration bit stream to be partitioned across two QSPI devices to effectively double the read data transfer rate. (Source: Cypress Semiconductor)

This 11-pin dual QSPI configuration is attractive when large FPGA devices require large configuration (i.e., high density) configuration bit streams to be transferred in the fastest possible manner. 

Flash Configuration

Next-generation Flash memories operate with a x1 (primarily for SFDP access), x4 or x8 IO bus width.  Data can be transferred in either an SDR or DDR format and high-speed transfers are facilitated by using a new Data Strobe signal.  For example, the octal configuration of the Semper NOR Flash device from Cypress uses an 11-pin interface (see Figure 4).

Figure 4 – Data can be transferred using a x1, x4, or x8 IO bus width in either SDR or DDR format using a low pin count interface. Shown here is the octal configuration of the Semper NOR Flash from Cypress using an 11-pin interface. (Source: Cypress Semiconductor)

The new Data Strobe must be incorporated into the FPGA configuration interface to take advantage of the high throughput read capabilities of next-generation Flash devices.  The Data Strobe is edge-aligned with the output read data in a manner identical to the way the strobe is used on Low Power DDR DRAM devices (Figure 5).  The Data Strobe “paints” the data eye and allows the FPGA to effectively capture the data at high clock rates. 

Figure 5 – A x8 DDR read transaction with the Data Strobe edge-aligned with the output read data to enable the FPGA to effectively capture the data at high clock rates. (Source: Cypress Semiconductor)

One Flash feature that is well-suited for FPGA configuration is the support of a continuous read operation.  A continuous read begins with the host (MCU or FPGA) asserting CS#, then issuing the read command followed by the target address.  After a number of latency cycles. the memory device outputs data from the target address.  If the host continues to toggle the clock, the memory will respond by outputting data from next sequential address.  The memory will continue to output data from sequential addresses as long as the clock continues to toggle.  This sequential read function can allow an FPGA to be configured with a single read transaction.

Another feature that facilitates FPGA configuration is the AutoBoot function.  AutoBoot performs an automatic read from a preconfigured target address during Power on Reset and then immediately outputs data upon the first assertion of CS# (Figure 6).  This function is also useful for ASIC devices that need a simple configuration mechanism.  Once CS# is deasserted, the memory returns to its standby state and subsequent operations are processed in the normal manner.

Figure 6 – The Autoboot Read function (with 3 warmup cycles) in action. (Source: Cypress Semiconductor)

The write transaction for NOR Flash devices (see Figure 7) is virtually identical to standard SPI operations with two exceptions.  First, the new Data Strobe signal must be driven LOW during the entire transaction.  Second, when configured for DDR operation, the data is written as words (16b) instead of the byte write programming granularity found on legacy SPI products.

Figure 7 –The write transaction for NOR Flash requires that the Data Strobe signal be driven LOW during the entire transaction and that data is written as 16-bit words when configured for DDR operation. (Source: Cypress Semiconductor)

Next-generation NOR Flash devices provide the high throughput needed to meet the increase density and instant-on requirements of large-scale FPGA-based applications.  All major NOR Flash manufacturers participated in the development of the JEDEC xSPI specification, assuring a wide range of sourcing options for OEMs.  The JEDEC xSPI specification covers the octal SPI interface described above, as well as the HyperBus interface, both offering 400MB/s read throughput.  The read throughput that has been achieved is dramatically higher than legacy SPI offerings.  Modifications to the FPGA SPI controller are required to take advantage of the high-speed infrastructure.  New functionality needing consideration include DDR data rates, a new Data Strobe pin used for data capture, and a widened x8 bus interface. In addition, some NOR Flash devices, such as the Semper NOR family from Cypress, allow for the elimination of one of the QSPI devices when a dual QSPI configuration architecture has been implemented.  The performance offered by next-generation Flash memories will be attractive in situations that need fast FPGA configuration times and also for FPGA applications that perform real-time reconfiguration.

Cliff Zitlaw has been involved in the development of semiconductor memories for 36 years.  Cliff’s primary focus has been on bus interfaces that optimize memory performance within different applications constraints.  Cliff was the inventor of Xicor’s Microprocessor Serial Memory interface (EEPROM), Micron’s CellularRAM interface (PSRAM) and Cypress’s Hyperbus interface (NOR and PSRAM).  Cliff is the author or coauthor of 49 patents related memory functionality and usage.  In his spare time Cliff likes to eat barbeque, watch television and take naps on Saturday.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.