Infusing Speed and Visibility Into ASIC Verification - Embedded.com

Infusing Speed and Visibility Into ASIC Verification

High-performance, high-capacity FPGAs continue to experience anexponential growth in usage, both in their role as prototypes for ASIC/SoC designs and as systems intheir own right. These designs typically involve complex combinationsof hardware and embedded software (and also, possibly, applicationsoftware).

This is resulting in a verification crisis because detecting,isolating, debugging, and correcting bugs now consumes significantlymore time, money, and engineering resources than creating the design inthe first place.

The problem is that bugs in this class of design can be buried deepin the system and can manifest themselves in non-deterministic waysbased on complex and unexpected interactions between the hardware andthe software. Simply detecting these bugs can require extremely longand time-consuming test sequences.

Once a problem is detected, actually debugging the design requires asignificant amount of time and effort. Furthermore, when verificationtests are performed using real-world data, such as a live video streamfrom a digital camera, an intermittent bug may be difficult, if notimpossible, to replicate.

There are a variety of verification options available to engineers,including software simulation, hardware simulation acceleration,hardware emulation, and FPGA-based prototypes. Each approach has itsadvantages and disadvantages (Table 1below ).

RTL simulators, for example,are relatively inexpensive, but full-system verification performedusing this approach is extremely slow. One major advantage of softwaresimulation is visibility into the design. Having said this, as moresignals are monitored and the values of these signals are captured,simulation slows even farther.

Table1. Comparison of Conventional Verification Technologies

At the other end of the performance curve are FPGA s which offer a significantadvantage with regard to their ability to run at real-time hardwarespeeds. In the case of ASIC/SoC designs, FPGA-based prototypes are alsorelatively inexpensive as compared to hardware acceleration andemulation solutions. Until now, however, FPGAs have suffered from theproblem of gaining visibility to their internal state and signals.

This article will provide an overview of the various conventionalverification options available to designers and summarizes theadvantages and disadvantages of these different techniques.

Overview of ConventionalVerification Options
As an introductory example, consider the performance of a variety ofsoftware simulation techniques as compared to an FPGA-based prototype.This particular example involves the booting of a real-world cell phonedesign.

Figure1. FPGA-based prototypes offer an extreme performance advantage overvarious software simulation techniques

As show in Figure 1, above, in addition to requiring a testbench, even a high-capacity,high-performance RTL simulator took 30 days to boot the system.Similarly, a traditional hardware/software co-verification environment,using an instruction set simulator (ISS) which also required atestbench, took 10 days to boot the system.

Meanwhile, a C/C++ simulation of the system brought the boot timedown to 24 hours, but this form of verification provided only limitedvisibility into the internal workings of the system. By comparison, anin-system FPGA booted the system in only three seconds.

This means that the FPGA-based environment can be used to verify thesystem running under real-time workloads; also that this environmentcan be used as a platform for embedded and application softwaredevelopers to integrate and verify their code in the context of thereal system. The main problem with the FPGA, when used in a traditionalverification environment, is lack of visibility with regard to itsinternal signals and state, including the contents of any memories.

As previously noted, softwaresimulators are relatively inexpensive, but full-systemverification performed using this approach is extremely slow. At theother end of the performance curve are FPGA-based prototypes which arealso relatively inexpensive and are very fast.

In between these two approaches are hardware-accelerated simulationand emulation, which are much faster than software simulation, much,much slower than FPGA-based verification, and MUCH more expensive thanboth. The end result is that FPGA-based prototypes give the bestprice/performance by far(See Figure2, below )

Figure2. FPGA-based Prototypes Price/Performance

The only significant drawback with traditional FPGA-basedprototyping systems is limited visibility into the inner workings ofthe system. There are a number of conventional techniques that can beused to improve the visibility into the FPGA, but each has its own setof limitations.

One common technique, for example, is to time-division-multiplexinternal signals onto the FPGA's primary input/output (I/O) pins. Thisapproach provides increased visibility, but it severely degrades theperformance of the system.

Another common technique is to embed “logic analyzer” macros intothe fabric of the FPGA. These macros can be used in various ways. Inone scenario, the firing of a user-defined trigger condition can beused to instruct the macro to start gathering data from a set ofsignals.

Using macros to gather information
A more common usage model is for the macro to be continuously gatheringdata from selected signals into a block of RAM (once the RAM hasfilled, earlier data starts to be overwritten). When a user-definedtrigger condition occurs, such as a breakpoint being reached, the macrostops collecting data and the stored signal values are passed to theoutside world via the device's JTAG port.

The advantage of the embedded logic analyzer technique is that itallows the FPGA to continue to run at real-time hardware speeds. Thedisadvantage is the limited number of signals that can be monitored andthe limited '”depth” of data (the number of cycles) that can begathered from those signals.

This is because each test vector applied to the inputs “explodes” byorders of magnitude with regard to changes on signals internal to thedevice. Furthermore, this approach provides only limited visibilityinto the contents of memory blocks in the design.

Comparing the various verification solutions across a “visibilityrange”, from no visibility whatsoever all the way to full visibility,one will see that software simulation comes out on top, with hardwareaccelerated simulation and hardware emulation close behind.

Meanwhile, conventional FPGA-based prototypes trail significantly interms of visibility when using the embedded logic analyzer technique tomaintain their speed advantage (See Figure3, below ).

Figure3. Conventional FPGA-based prototypes suffer from limited visibility,which makes them difficult and time-consuming to debug.

When visibility enhancement isinsufficient
Although the embedded logic analyzers introduced in the previous topicdo provide the ability to observe signals internal to the FPGA, they donot in-and-of-themselves provide full-signal visibility. In order toaddress this, the embedded logic analyzer technique has recently beenaugmented by the concept of “visibility enhancement.”

In this case, the data associated with only a subset of the internalsignals is captured, and the visibility enhancement application thenextrapolates the data on the unobserved signals. This extends thecapabilities of the embedded logic analyzers, but the massive amountsof internal data associated with relatively few input test vectorsstill leaves this technique limited with regard to the depth of datathat can be collected.

If users want to increase the depth (in terms of the number of inputvectors), they have to make tradeoffs with regard to the number ofsignals that are to be monitored. In turn, this may necessitate anumber of verification runs to track down a problem condition, whereeach run involves monitoring a different set of signals. This meansthat some non-deterministic and/or intermittent bugs can “slip throughthe net,” because they may simply not manifest themselves on subsequentruns.

Even more problematic is the fact that visibility enhancement worksby improving visibility based on gate-level signal values. Theresulting visibility-enhanced signals cannot be brought back into theRTL world unless the tools understand all of the synthesisoptimizations that took place.

Similarly, if an assertion is being observed, the poor correlationfrom the gate-level representation in the FPGA to the assertion'soriginating RTL – which may be partitioned across multiple RTL entities- can make the process of debugging the design “interesting” to say theleast.

The solution is total visibility
The solution to the visibility problem for both FPGA-based prototypesand FPGA-based systems is to provide technology that enables 100percent visibility into the FPGA while still allowing the FPGA to runat real-time hardware speeds.

The way to achieve this is to replicate logic inside the FPGA, andto then store, and delay the application of, stimulus to thisreplicated logic (See Figure 4, below) .As an initial example, consider the case where the entire design hasbeen implemented using this technique.

Figure4. In the total recall approach the key concept that makes mapping muchsimpler by replicating logic inside the device.

As seen, the stimuli applied to the primary inputs are fed directlyto the design's real logic; the live responses seen at the device'sprimary outputs are generated by this logic. Meanwhile, the originaltest vectors are also fed into a block of memory acting as a FIFObuffer.

For the purposes of this example, assume this FIFO is 1,000 wordsdeep. In this case, the stimulus being applied to the replicated logicwill trail the stimulus being applied to the design's real logic by1,000 clock cycles.

When a problem is detected, such as an assertion triggering or anincorrect response at the primary outputs, the replicated logic and itscorresponding memory FIFOs are paused. At this time, the contents ofthe stimulus FIFO and the current state of the replicated logic areextracted from the FPGA via the device's JTAG port.

The reason this type of approach provides such a tremendous testvector “depth”, as compared to conventional logic analyzer macro-basedtechniques, is that only the primary stimulus is being stored; that is,it is not necessary to store huge amounts of internal data.

Of course, the current state of the replicated logic pertains to itssynthesized gate-level representation. And, because design andverification engineers prefer to work with their original registertransfer level (RTL) representations of the design, this technologyallows user to map the current state of the gate-level representationinto an equivalent state for the original RTL representation.

This is non-trivial, because performance related optimizationsalmost invariably mean that there is no one-for-one correspondencebetween the two representations. TotalRecall techniques (Figure 4, above )can perform this-type of mapping because it has access to the synthesistechnology that was used to generate the gate-level representation inthe first place.

Thus, once a bug has been detected while running at real-timehardware speeds, users are immediately taken into their familiarsoftware simulation environment with an initialized design and atestbench that will guide them directly to the bug.

Mapped internal state information is used to initialize the internalstate of an industry standard RTL simulator. Meanwhile, the contents ofthe stimulus memory are used to generate a testbench that will drivethe software simulator.

It is important to note that it is not necessary to replicate theentire design in order to use a technology such as this. It may be thatthe verification engineers wish to focus on only one of the functionalblocks forming the design. In this case, technology can be used toreplicate just those blocks. For example, consider this type oftechnology being applied to a single functional block as illustrated inFigure 5 below .

Figure5. The Total Recall approach can be applied to a subset of the design.

In this instance, when a bug is detected, users can generate alocalized testbench that is specifically targeted at the suspect block.This testbench, along with the initial state information for thesuspect block, can then be passed to the appropriate design and/orverification engineer(s) for detailed analysis.

The are many advantages to this approach. For example, in additionto providing total access to all of the design's internal signals,users also have total access to all of the contents of the design'sinternal memory blocks. And, even an intermittent bug that occurs deepinto the verification can be easily trapped, isolated, and quicklyevaluated.

Technology that provides the visibility associated with softwaresimulation combined with the extreme real-time hardware speeds ofconventional FPGA-based prototypes will be an indispensable part of theverification of complex ASICs/SoCs. The ultimate goal is to create anenvironment where designers can get to work quickly, debug rapidly andmake changes without delay.

In this new era, FPGA-based prototyping will hold a key positionalong side other verification methodologies. The ability to run”at-speed,” i.e. running tests as fast as the hardware will go, will bean indispensable part of SoC verification. Only then will designers beequipped to deal with the emerging challenges associated with increaseddevice and software complexity.

Mario Larouche joined Synplicity as a Developer in 2000and has served as the Director of Engineering since 2005. In hiscurrent role, Mr. Larouche is responsible for overseeing the creationof Synplicity's FPGA verification & debug solutions, including thepatented TotalRecall Verification Technology. He can be reached atmario@synplicity.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.