Bug hunting SoC designs to achieve full functional coverage closure - Embedded.com

Bug hunting SoC designs to achieve full functional coverage closure

The intent of verifying a System on chip (SoC) is to ensure that the design is an accurate representation of the specification. Achieving fully verified SoC is an arduous task, yet verifying the SoC by using both directed verification and constrained random verification (CRV) can result in a 100% verified design.

Constraints help to reach coverage goals by shaping the random stimulus to push the design under test into interesting corner cases and avoid hitting invalid/illegal scenarios.

Using CRV, corner cases in the design can be exercised and system behavior under stress can be observed. Stress can be induced in the system by generating random traffic in the SoC. ‘Random traffic’ implies initiation of transactions from random selection of master to randomly selected slaves with any transaction type, size of data, and latency of transactions.

This paper describes how to use constrained random verification to uncover bugs that are difficult to find using traditional directed verification.

Differences between scope of directed and random verification
In directed verification, the verification environment has mechanism to send the stimulus to DUT, collect the responses, and check them. The stimulus is generated, and each stimulus verifies specific features of the design.

This becomes tedious when design complexity increases; it becomes more difficult to create stimuli that fully exercise the design. Stimulus maintenance becomes harder and time consuming.

In directed verification, the verification engineer has to list out each scenario, hence there is a probability of missing potential bug scenarios, allowing bugs to escape and lurk in corners, hidden until late in the development cycle, or not found at all until product is taped-out.

The solution to the above problems is constrained random verification (CSR) in which a constraint provides the feature by which to reach the coverage goals through shaping the random stimulus to push the design under test into interesting corner cases. To constrain data items, two things are necessary:

  1. Identifying the data item classes and their generated fields, and,
  2. Creating a derivation of the data item class that adds or overrides default constraints.

Using constrained random verification, the stimuli required to verify test features are generated automatically. Verification owner specifies the set of specifications, and the test bench automatically creates a solution space and picks up scenarios from it.

Constrained random verification is necessary for these reasons:

  • It stresses the system bus and connected gaskets with multiple masters working simultaneously.
  • It provides a random transaction based environment which involves all masters/slaves.
  • It allows the SoC designer to focus on system level issues rather than unit level ones.
  • The use of CSR shortens functional verification cycles by rapidly achieving coverage goals once the random testbench environment becomes stable.

Setting up your design for CSR
A mandatory requirement to do CSR is to set up a self-checking testbench (Figure 1) with built-in capabilities for stimulus generation, drivers, monitoring, scoreboarding, and checking. Once that is done the following methodology should be followed:

  • Identify the masters and slaves (example – memories) in the system.
  • Replace IO masters (unit level) by stubs (assuming the hard IPs are already verified). The rest of the system remains unchanged with all the physical buses and gaskets at their respective places.
  • Set up a weight based control for random selection of slaves.
  • Set up a weight based control for number and size of transactions from different masters.
  • Set up a weight based control mechanism for type of transaction issued – read, write, snoopable, etc.
  • Set up randomized delay between transactions from each master.
  • Set up standard monitors and system level scenarios specifically targeting functional coverage; sampling should be available.
  • Set up coverage dumping with integrated exclude list.



Click on image to enlarge.

Figure 1: Test environment for random stimulus generation and coverage

Performing CSR in your SoC design
Once you have set up your test environment as shown in Figure 1 , you need to first write a generic stimulus base class from which you can derive various methods and attributes to convert generic commands with respect to specific protocols.

Say we use N number of protocols in our SoC. Let’s assume one of them is the AXI protocol. We can derive random_axi stimulus from stimulus base class to randomize and convert the generic commands into AXI Commands. Similarly, this can be done for as many protocols as we have in our design.

It is important to identify the protocol used by each master in the design. As specified, each of the masters will map to driver_Master_N using its own protocol-specific stimulus. These drivers are directly interfaced with the DUT to generate traffic in the system.

The file parameters.txt (Figure 2 ) is picked up at simulation time and contains a list of parameters such as slave weightage, read/write weightage, max size of transaction issued, etc.

Automating test pattern generation. Implementing such a methodology into your SoC design flow should yield immediate benefits. To improve coverage, it is best to automate the random scenarios. This can be done by changing the transaction parameters dynamically. As a result, each stimulus corresponding to every master would get generated automatically for each of the masters used in the test. This can be done by setting up your system's parameters.text file as shown in Figure 2 .

Figure 2: Method for passing random parameters through file parameters.txt

Typical SoC bug scenarios and how to deal with them
With our SoC design flow set up as described earlier, we found we were able to deal with a wide range of interesting bug hunting scenarios in the following ways:

Scenario #1: Multiple masters trying to access a single slave simultaneously. Check if the system bus is able to arbitrate properly rather than breaking down. Also check that the system bus can sequence all these transactions onto that slave in a proper fashion. All these transactions should be followed by valid read/write responses.

Scenario #2: Single Master issuing transactions to multiple slaves. The AXI protocol includes AXI ID transaction identifiers. All transactions with a given AXI ID value must remain ordered, but there is no restriction on the ordering of transactions with different ID values.

Out of order transactions A single physical port can support out-of-order transactions by acting as a number of logical ports, each of which handles its transactions in order. By using AXI IDs, a master can issue transactions without waiting for earlier transactions to complete. This can improve system performance because it enables parallel processing of transactions. Slaves are required to reflect on the appropriate BID or RID response on receiving an AXI ID from a master. Randomize all the IDs of master so that it can throw unordered transactions to different slaves concurrently. Check that the appropriate BID or RID is reflected by the slave.

Ordered transactions There is no mandatory requirement for slaves or masters to use AXI transaction IDs. Masters and slaves can process one transaction at a time, which means transactions are processed in the order they are issued.

Scenario #3:  Issuing accesses to unaligned addresses and boundary addresses in the slave (memories)

Scenario #4: Issuing back-to-back transactions from the IP blocks. These are sent to the slaves with randomized delays between transactions from every master, not only to check successful completion of the transactions but also to verify the minimum and maximum latency numbers in the system with the number as expected by the architecture team.

Scenario #5: Cache coherency and stashing . Stashing implies that data is placed in L1/L2 cache at the same time it is sent to memory. Configure L2 registers for stashing. Keep the core execution in non-sharing mode.

The IO masters will stash the data in the L2 cache in random fashion (for snoopable transactions) while the core will continue with its own execution. In final check, the core will fail for overridden addresses. Failed patterns will be qualified against expected stashed array.
Read sharing between cores Each master would have fullread and write access to a block of memory. This master is called theprimary master for that block. Other masters may be allowed to read thismemory but only the primary master can write to this block. Blocksshould be divided on cache line boundaries so that a cache line wouldnot be in more than one memory block.

Read sharing between core and IO masters TheIO masters issue snoopable reads from the core read-write memory block.In this case, testbench maintains an expected data array (updated on asnoop push).

Typical bugs caught by CSR verification
Inany SoC design, there are always some bugs that lurk in grey areas thatare difficult to catch by directed tests but easily found with CSRrandom tests. Following are some typical bugs we caught using therandomized testing procedures we have just described:

Swappedcontrol signals between different modules couldn’t be found out withnormal RTL simulations wherein each module owner runs its own specificblock tests only. Since in a random environment all the masters/slavesare simultaneously active in the system, such swapped control linesbetween modules were easily caught as a huge chunk of tests in randomregressions failed.

Many times the latency observed for thesystem is more or less than expected by architecture. This will beuncovered only when we choke the system fully by driving huge traffic inthe SoC. These inconsistencies in architecture specifications were alsodiscovered with the help of CRV.

The incompliance in latency numbers led to serious gaps such as the following:

  • overrun or under run errors in FIFOs
  • In TDM, latencies higher than expected in system led to very late arrival of data packets in jitter buffers. Hence, such packets were dropped leading to distortion.
  • Similar to FIFOs, overrun and underrun errors could also occur in jitter buffers leading to voice quality degradation. In accordance with this, Buffer depths or clock frequencies need to be suitably changed to handle overrun/under run errors.

Some useful bug hunting tips
Duringthe process of bug hunting our own SoC designs using the methodologiesdescribed earlier we came across some useful tips you might want tofollow:
1. Enable all monitors and checkers in the system for automatic data comparing and protocol checking.
2.Accesses to registers in DUT can be done by two ways. Front door accessuses the physical bus. To write a value to DUT registers, it takes someclock cycles via front door access. Writing thousands of registers isresource consuming and time consuming. Via the back door path, registerscan be accessed directly in zero time. Backdoor requires an hdl path tothe register in the DUT. Hence, RTL registers are written /read byhierarchically accessing the registers based on their names. This methodhelps in saving simulation time as well as resource utilization(physical bus).

To uncover bugs hidden by front door accesspath(like data bus reversal or mangled memory addresses), write viafront door and read via back door followed by data comparison betweenread data and expected data array maintained by testbench, so that thegaps in front door path can be exposed by data comparison failures.

3.Read back all the memories (slaves) in the system after all thetransactions have been completed and do the data compares. Ensure thatthis is done by back door access rather than by front door to savesimulation time.

Coverage-driven verification
In the final phase of random verification, a functional coverage report (Figure 3 ) is used as sign-off criterion to ensure that the system has been satisfactorily tested.



Click on image to enlarge.

Figure 3: A sample covergroup for read transaction

Coverage analysis depends upon what we are looking for and must answer the following questions:

  • Did we exercise all transaction types of the bus? (Control oriented, part of monitor)
  • Have we initiated transactions of every type and length? (Data oriented, part of test bench)
  • Did we model all transaction types to every interesting address range, especially boundary addresses? (Combination of control and data coverage)

Functional coverage is the determination of howmuch functionality of the design has been exercised by the verificationenvironment. It is a user-defined coverage which maps everyfunctionality to be tested (defined in the test plan) to a coveragepoint.

Whenever the functionality to be tested is hit in thesimulation, the functional coverage point is automatically updated. Afunctional coverage report can be generated that gives us a summary ofhow many coverage points were hit. Enabling functional coverage to averification environment involves three steps:

  1. Identifying the functional coverage and cross coverage points
  2. Implementing the monitors in the system
  3. Running simulation to accumulate the functional coverage and report analysis

An example of writing some relevant cover points for read transaction could look like this:

Rd_addr_event is the event captured by monitor and when this event is triggered,relevant parameters are sampled such as address, ID of master/slave,data, length and size of transaction , snoopable or unsnoopabletransaction , etc.

Suitable bins are defined and invalid cases are defined as ignore bins.

Coverage grading
Ina random test environment, coverage grading is used to identify theseeds and scenarios that contribute to coverage the most. The followingcommand can be used to generate functional coverage grading in vcs

   urg -dir ( *.vdb ) -grade -metric group

Gradingcan be used during coverage closure exercise to find out running whichscenarios and seeds will contribute to coverage the most. Once thisinformation is generated, scenarios and seeds that contribute moreshould be run more than others for achieving coverage numbers quickly.It is advisable to use grading when the random verification environmentis frozen.

Conclusion
The typical design verificationscenario today relies on manually developed, directed, and constrainedrandom tests to bring functional coverage to desired levels and therebycarry out a systematic bug hunting process.

Several complexscenarios and corner cases can be reproduced by constrained random teststo uncover many gaps in the system that are prone to be missed out bydirected tests.

References:
1. Test Directive Generation for Functional Coverage Closure Using Inductive Logic Programming

2. Constraint-Based Random Stimuli Generation for Hardware Verification

3. Applying Constrained-Random Verification to Microprocessors

4. AXI Protocol Reference guide

5. Coverage Driven Constraint Random Verification Architecture

Vijeta Marwah received her B.Tech. Degree in Electronics and Communication fromNational Institute of Technology, Kurukshetra. She is currently workingas a Design Engineer with Freescale Semiconductor, Noida, India in SoCverification team and has worked on clocking, several peripherals andConstrained Random Verification. She can be reached at .

Saurabh Mishra holds a Bachelor Degree in Electronics and Communication Engineeringfrom Uttar Pradesh Technical University Lucknow and AdvancedPost-Graduate Diploma in VLSI design from VEDANT Semiconductor Mohali.While working with Freescale Semiconductor as Lead Design Engineer, hehas developed expertise in Functional and Constraint Verification usingSystem Verilog . He has also presented a paper titled “Coverage DrivenVerification” at SVUG FALL 2009. He can be reached at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.