FPGA Device Reliability and the Sunspot Cycle - Embedded.com

FPGA Device Reliability and the Sunspot Cycle


The potential effects of background radiation on electronics systems are well known. Radiation hardening of devices is required in space systems, and design-for-radiation is a paramount consideration in avionics, where aircraft are more susceptible to odd instrumentation behavior as they fly higher in the atmosphere. A flipped bit on a calm flight has much more serious consequences than a flipped cup of coffee in turbulence.

Radiation issues exist at sea level, but the effect there has been relatively benign—until recently. Scientific observations over the past two years suggest that certain types of sea-level radiation have increased to historic highs. As a result, attention is focusing on semiconductor selection as an increasingly important way to minimize and avoid radiation-based anomalies on the ground.

First, some background on background radiation, which has a number of sources. These include cosmic rays originating outside our solar system, charged particles streaming from our sun in the solar wind and terrestrial radioactive decay of materials relatively abundant in our earthbound environment.

Cosmic rays and heavy ions in the solar wind are mitigated to a great extent once they reach the Earth's atmosphere. At sea level this atmospheric blanket is at its thickest, and the neutron environment is the most benign.

So what's to worry about? Plenty.

Neutron flux density is still significant at sea level on the equator. Since neutrons can penetrate many feet of concrete, shielding ICs against neutrons is not practical. Any increase in neutron flux densities in the operating environment results in real reliability concerns for system designers using FPGAs.

This concern has been heightened recently, thanks to observations taken in snowy Scandinavia, where one of a worldwide network of neutron monitoring stations is located: University of Oulu in Finland.

Figure 1

A look at variations in activity going back to 1964 (Figure 1 above ) provides one correlation. Sunspot (a relatively dark area on the Sun's surface resulting from a strong magnetic field) count data has been collected for centuries.

Figure 2

These sunspot counts show variation on a cycle of roughly 11 years. By overlaying Oulu neutron measurements with sunspot count data since 1964, we see that the neutron count measured at Oulu varies inversely with the sunspot count (Figure 3 below ).

Figure 3

By 2008 the sun was entering an unusually quiet period for sunspots, and by the beginning of 2008 the neutron count readings were consistently exceeding the previous 1965 maximum. By October 2008 the daily neutron count readings rarely fell below the 1965 maximum (Figure 4 below ).

Figure 4

The Impact of neutron flux
So what does this enhanced neutron flux environment mean for system reliability? When an incoming charged particle collides with a semiconductor, the results are significant.

Charged particles leave a trail of ionization through the substrate, causing a momentary current pulse in nearby transistors that is proportional to its incoming energy state.

Data in memory cells and flip-flops can change. When a neutron strikes a device, it may collide with a silicon atom in the substrate, with that impact ejecting a spray of heavy ions that induce a current pulse in CMOS ICs. This current pulse, which is proportional to the energy state of the incoming neutron, can cause data in memory cells and flip-flops to change.

Both alpha- and neutron”induced errors can destroy the integrity of data stored in SRAM cells. This is of particular concern to the increasing numbers of system designers deploying FPGAs in mission-critical or other high reliability applications on the ground, such as telecommunications systems. SRAM-based FPGA architectures have to grapple with this issue in ways that aren't necessarily palatable.

If the memory or flip-flop data change does not change the fit, form or function of the device, it is a “soft error.” Mitigation techniques for soft errors are relatively straightforward and include error detection and correction (EDAC) and error-correcting code (ECC) techniques. But these techniques consume FPGA circuitry. A larger device is then required to perform the same functionality while also correcting these errors.

If the cell that changes due to the neutron or charged particle event is an SRAM configuration memory cell that controls logic functions or a routing matrix, the data error may change circuit functionality, internal cell connectivity or signal routing between cells within the SRAM FPGA.

This is a “firm error,” and it persists until detected and cleared by rebooting or power cycling the volatile SRAM FPGA and reloading the FPGA with the correct configuration programming stream. Mitigation of firm errors is virtually impossible in SRAM FPGAs.

SRAM FPGA vendors have developed hybrid SRAM plus NVM devices in an attempt to address firm errors, offering persistent data retention with SRAM FPGA speeds. While these are an improvement over pure SRAM FPGAs—they do not completely lose all configuration data when they lose power—there is no gain in reliability. These hybrid devices exhibit the same firm-error susceptibility and failure rates as pure SRAM FPGA devices.

Geometry Matters
The situation worsens the longer we march to Moore's Law. Smaller process geometries open devices to more, not fewer, radiation issues. Smaller SRAM cells are more easily upset by random low-energy particles, and a greater proportion of neutron and alpha events can transfer enough energy to cause an error.

Supply voltage scaling also increases firm-error susceptibility, as lower voltages lower the upset threshold and make single-event errors more common. The net effect is that smaller process geometries dramatically increase the probability of single-event upset (SEU) soft or firm errors in FPGAs based on SRAM technologies.

The observations from Finland and the theoretical impact on system design would be an intellectually stimulating academic paper were it not for the fact that real-world effects are being recorded in systems.

Throwing FITs
iRoC Technologies, an independent organization with expertise in soft error testing and regular testing slots at Los Alamos National Laboratory, conducted both neutron and alpha testing on FPGAs using three different programming technologies in five different architectures from three major FPGA vendors.

Table 1

The FPGAs were tested until a significant number of failures were observed. Based on these results, failures-in-time (FIT) rates were calculated (one FIT represents a single failure in 1 billion hours of operation) and are shown below in Table 1 above for neutron test results, and Table 2 below for alpha test results.

Table 2

This testing shows that in neutron and alpha particle testing performed by third parties to date, SRAM FPGA devices in a variety of process geometries and cell technologies from a number of leading vendors all exhibit significant failure rates.

Reprogrammable FPGA devices based on flash technology, however, experienced zero failures in alpha or neutron testing. Antifuse FPGA devices also experienced zero failures, illustrating why these one-time-programmable devices dominate high-reliability spaceflight applications that do not require reprogrammability.

Immunity Is a Good Thing
System designers can no longer afford to ignore radiation effects on the ground. Systems in terrestrial Internet infrastructure applications are logging errors that cause system uptime issues and which have been traced back to SEU radiation events.

This is not just sea-level applications—no terrestrial application system designer can ignore the fact that Denver, Colorado is more than 5,000 feet (1,600 meters) above mean sea level, and cell towers with their associated electronics are installed on mountaintops at significantly higher altitudes than that.

Data from around the world shows that sea-level neutron flux rates are continuing to increase, which means that problems once confined to the heavens have come home to roost for system designers here on Earth.

Mike Brogley , a 20 year veteran of the semiconductor industry, is a product marketing manager at Actel Corp.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.