SoC designers are learning the benefits of applying high-capacity formal verification techniques at every stage of the design. Our formal tools are powerful and versatile enough for a wide variety of tasks such as architectural exploration and RTL verification, all the way through post-silicon debug.
A good rule of thumb to consider how costly a missed bug can be: Finding bugs in model testing is the least expensive option; if it’s found in component test, add 10X to the cost; 10X more in system test; and another 10X if it makes it into production. You do the math. If you thought watching your team go down in the World Cup was depressing, try explaining to your boss how you let a bug loose in the field. Taking the analogy a bit further, using formal in the post-silicon lab is like having a really good goalie, because it’s the only method for finding, fixing, and verifying the fix to shave untold engineering-hours from the design cycle, and maybe even save a job or two along the way.
Post-silicon debug means trying to reproduce bugs seen in the lab using directed-random-simulation and emulation, but often these traditional approaches, unlike formal, are unable to root cause the bug fast enough. Let’s look at a fairly typical scenario.
When a problem is encountered at this late stage it can be hard to debug due to lack of visibility into the silicon in the post-silicon lab. You can find the problem but it is often abstract and not well understood: the chip hangs, not responding, dropping packets, sending out wrong output, etc. The first step is to try to determine exactly what is happening. Many chips have on-chip trace extraction capability, e.g., controls to freeze the chip when certain events are identified or on-chip logic analyzers that allow a selected group of signals to be muxed to external pins. This lets you extract a failure trace, capturing a limited number of signals for a number of cycles before (and maybe after) a problem is detected. The next step is to isolate and root cause the post-silicon bug. At this point it is known that the chip is exhibiting illegal behavior as it can be seen in the trace, but the trace represents the last N cycles of the run and it is not known how this state was reached. Typically, there will be a limited number of signals in the trace and it is difficult to choose the right set of signals to show the problem.
Figure 1 represents this dilemma: The last few cycles of the failing scenario can be observed, but how can the root cause of the problem be found? How can the designer know in which block the bug is located?
Typically, directed-random-simulation is used to isolate the bug, but can it reveal how this state was reached using simulation? It is not clear where the bug is happening as there is weak controllability in simulation and existing methods, but what is known is that it causes block D to act incorrectly. The bug happens after a 3-4 hour run in the lab when a certain kind of traffic is injected (e.g., only for read transactions on bus X). Finding the root cause with directed-random-simulation can be extremely difficult. If it takes four hours of real time with random traffic to hit the bug, how long will it take to reproduce it when simulation time is dramatically slower?
One of its most valuable features is the ability to find bugs fast. Usually, finding counter-examples (a stimulus sequence from a legal design state that violates an assertion) with formal is much faster than reaching full proofs on the same property, and this technique lends itself very well to bug hunting. Using formal verification the user can “freeze” a specific state in a specific cycle, and then continue the analysis from just that point, making it unnecessary to analyze the entire design. Additional insight is gained through the use of the Visualize feature in Jasper formal tools to automatically generate and manipulate waveforms without a testbench, answering “what-if” design questions and providing visual confirmation of design functionality.
Finding the bug with formal follows almost the same process as in a pre-silicon formal verification flow. There are, however, some significant differences:
· Search is for one specific bug, one specific scenario
· Not looking for full proof or coverage completeness
· Just need to find the scenario that leads to the illegal behavior
· Can allow over-constraints to simplify the process (e.g., don’t allow Write transactions because the bug happens with Read transactions only)
A couple of powerful examples of formal use in the post-silicon lab illustrate its value and what we like to call formal’s ROI – or when your technology investment pays huge dividends. One of our newer customers was using simulation and emulation only, but even after weeks of exhaustive testing, a corner case and cycle-dependent bug was found, something easy to miss with traditional methods. After modifying the design they checked their results using formal and in a short amount of time (mere hours rather than weeks) it was learned that the fix was broken, resulting in yet another scenario that would cause the bug. Another ECO followed and formal verification showed the same bug in a different part of the logic which led to a third and final fix that passed the formal check with no problem.
An even more serious instance involved a customer’s large SoC hanging up in the field, resulting in a silicon recall. The post-silicon debug team used directed random-simulation starting with the little information available from the lab. They knew the chip was hanging and the problem was coming from the memory controller when it performed a read transaction. Without formal they worked more than three months and still did not isolate the bug’s root cause. Imaging sitting in those progress meetings! It was identified in the memory controller, sending its data to the wrapper block in a very specific pattern that activated a bug in the wrapper and caused a violation of the bus protocol. This timing alignment of different events in the system was very hard to hit with random simulation, hence it escaped as a silicon bug and was extremely difficult to detect in post-silicon simulations.
Formal was then used to root cause the bug and the engineer was given exactly the same information that the simulation team started with. In this case, the bug was found after just 2.5 weeks, and much of that time was spent ramping up on the design and protocols involved. Actually, once the setup was finished, and properties written, the runtime to find the counter-example was less than a minute – compared to weeks of simulations trying to hit the bug randomly. Afterward, formal was re-run on the fixed RTL code and uncovered two more bugs that the post-silicon simulations missed, saving the chip manufacturer another re-spin.
Just like the goalie, protecting your net by keeping those little bugs from sliding past in the post-silicon lab is essential since mistakes there can have devastating consequences. Given the exhaustive analysis performed by advanced formal verification techniques conclusive answers can be generated – often in minutes vs. days – as to whether or not a particular verification component is the source of the undesired behavior, enabling quick isolation of the root cause of the problem, and then verifying that resulting design modifications are valid.
About the author:
Lawrence Loh is Vice President of Worldwide Applications Engineering for Jasper Design Automation.
He holds overall management responsibility for the company’s applications engineering and methodology development. Loh has been with the company since 2002, and was formerly Jasper’s Director of Application Engineering. He holds four U.S. patents on formal technologies.
His prior experience includes verification and emulation engineering for MIPS, and verification manager for Infineon’s successful LAN Business Unit. Loh holds a BSEE from California Polytechnic State University and an MSEE from San Diego State.