Reducing tester-based silicon debug effort & time: Part 2 – A check-list of best practices
Editor's note: In the second of a two part series on reducing tester-based silicon debug effort and time, the authors provide a detailed check list of must-do practices to follow during verification for testing (VFT).
Considering that testing to ensure that a circuit is fully functional can cost as much as several millions of dollars, it is absolutely necessary at the verification for test stage (VFT) to do the maximum possible at that point - much before even the silicon is out - to ensure high probability of functionally alive silicon in the quickest possible time and at the lowest cost.
This second part in the series focuses on rules derived from our experience testing and debugging SoC designs that can be applied while creating the tester-specific testbench, generating the patterns, and simulating the suite.
Using these practice-derived rules can reduce the functional pattern bring-up time on the tester and reduce the chances of failure due to pattern mismatch issues. More importantly, following these rules will reduce the total debug time and efforts used to resolve observed failures and issues in the tester environment.
Crucial VFT design and test practices
Must-do practice #1
While creating the tester patterns, the design pads/ports used during simulation should be restricted to the pads that are available across all modes of testing, particularly the minimum set of ports which are available across all packages: communication between the core component of testcase (the .c/.h) and the testbench side (verilog/system_verilog component), or for any data mail-boxing or port toggling to highlight execution stages.
Issue Analysis The above practice seems like a basic step but is often missed due to full suite of pads/ports being available at the time of verification simulations of the tester pattern. The problem is that issues emerge once they are ported to be run on tester requiring extensive and tedious debugging in case of unsuccessful communication, data mailboxing, and/or port toggling, thus leading to unexpected pattern behaviors occurring on the production silicon.
Solution The verification engineer always needs to check the Test Pin Muxing sheet available from design-for-testing team to know what pads are available across all the modes of testing and from that create a tester specific mode of the testbench which only has those restricted set of pads available for use for flags, mailboxes etc.
Must-Do practice #2
In the tester pattern environment at verification level, there should not be any back-door loading of any memory location.
Issue Analysis Backdoor loading of memories in VFT environment can mask many potential issues that will ultimately emerge due to uninitialized memory being accessed in design once the pattern is run on tester.
System RAM or any other memories being used by the pattern may be getting initialized at zero-time through backdoor loading, as a legacy from normal simulation pattern environment (where this is often done to ensure no corruption in pattern execution).
But if a read happens on this uninitialized location of memory on tester (due to burst access even if the downloaded code doesn’t write to these locations), ECC (error correcting code) will be generated in the silicon. This will cause pattern failures that produce unpredictable intermittent behaviors due to randomness of the ECC when invoked at uninitialized locations. This makes it very difficult to debug the real issue since this will never be evident in simulation environment due to backdoor loading.
Solution Never initialize the memory through backdoor loading at zero-time in the testbench. This will help to catch errors due to accesses of uninitialized memory as a result of burst access or problems with code jumps and similar situations in the simulation stage. When a porthole address needs to be initialized beforehand, it must be done through the startup CRT (constrained randomized testing) code itself.
Must-do practice #3
For tester patterns, the start address for code (where core jumps after the initialization code execution) should always be aligned according to width of the instruction bus fetch.
Issue Analysis For example, say the instruction bus for the SoC core reads the data in 64-bit aligned format. But suppose that the infrastructure for tester patterns is such that downloaded code starts at an address which is not 64-bit aligned but rather 32-bit aligned (i.e. 0x40000104) and standard initial code initializes 256 bytes of the memory.
The result is that the location 0x40000100 has some random value since it is not initialized and was not written to with the downloaded code. When core jumps to memory for code execution, it reads data in 64-bit alignment. Since data on 0x40000100 is instead some random value, a read can generate a multi-bit ECC error leading to an exception message sent to the core, which then gets hung up.
Since the probability of the ECC error getting generated depends on the random nature of the uninitialized location data as well as a randomized ECC as well, it should come as no surprise that there will be an 1/10 (inconsistent) passing result on the tester, making debugging even more difficult.
Solution Always keep the start address of the downloaded code at a location that is aligned according to the width of the instruction bus.
Must-do practice #4
If cache is enabled in the initial code and the core initiates burst operations for fetching the data, steps should be take to ensure that the start address is aligned to the total width of the burst transactions.
Issue Analysis For example, say the core makes a 4-beat wrapping burst read for every burst operation, with cache enabled with each read of 64 bits. If the start address is not 256-bit aligned, we'll again end up in some ECC errors while reading uninitialized memory in the first burst fetch due to the problems discussed in Must-do practice #3.
Solution The start address must take care of the burst operation of the core and the enablement of its cache and make the start address alignment accordingly.