Using a processor-driven test bench for functional verification of embedded SoCs -

Using a processor-driven test bench for functional verification of embedded SoCs

For designs that incorporate or interface to an embedded CPU, processordriven tests can be a valuable addition to the suite of tests used toperform functional verification. Simply stated, Processor Driven Tests,or PDTs, are test vectors driven into the design via the processor busand can originate from several types of processor models.

For a bus functional modelthese tests consist of a sequence of reads and writes to variousregister or memory locations serviced by the processor bus. In thismode, they resemble an HDL test benchwhere the bus functional model relieves the user of handling thecomplexity and detail of the bus protocol.

With a full functional modelof the processor, tests in the form of embedded code are written in Cor assembly and are compiled to the target processor. These tests moreaccurately replicate design function where the processor comes out ofreset and begins fetching instructions which result in reads and writesto registers of peripherals, IP or custom logic.

A third method is toleverage the bus functional processor model to generate constrainedrandom bus cycles. Thismode is useful to load the bus with processor cycles while the abilityfor other bus masters to perform their data transfers is evaluated.

In this article we discuss each of these processor driven testbenchmethods in detail and present their strengths and weakness. We alsoexamine the inherent value of combining PDT with traditional HDLtestbenches.

Processor Driven vs. HDL testbenches
So how does PDT differ from HDL testbenches? Figure 1 below depicts an imageprocessor connected to an HDL testbench. The test bench initializes theimage processor registers, feeds it pixels for processing and comparesthe resultant pixels to the expected output.

Figure1: HDL testbench for Image Processor

In contrast the processor driven test depicted in Figure 2 below places the imageprocessor insitu, surrounded by the source and destinationmemories and interfaced to the processor itself. The test, written in Cand executed by the CPU as embedded code, loads the source memory withan image, initializes the image processor and clocks the design until adestination image is produced.

Note that once the test is setup, the design runs as it would in theend product, reading the source memory, transforming the image andstoring the result in destination memory. In this way the processordriven test more accurately replicates the designs operation, testingnot only the image processor but its interface to the source anddestination memories as well as the ability of the CPU to communicatewith all three devices.

Figure2: Processor driven testbench for image processor

PDT easy to reuse at SoC level andon live target
It's not difficult to imagine this block-level processor driven testbeing migrated to the sub-system or SoC level. Assuming additional hardware interfaces are not inserted betweenthe image memories or image processor, the C code for this test can bereused without change at the sub-system and SoC level. In fact, sincethe test is driven by the processor, this test can also be run on thelive target of an FPGA prototype or the fabricated device itself.

Some care must be taken when constructing the block level PDT testto insure its reuse at SoC or hardware prototype level. In the case ofour image processor some research into the memory sub-system planed forthe design will allow consistent memory control initialization andaddress ranges to be used at both the block and SoC level.

If the detail on the memory system design is not available when theblock test is being developed, the test will need to be adapted to thefinal memory configuration of the design for the test to be migrated upto the sub-system and SoC level. While these changes are small, makingsmall changes to hundreds of tests is big job. Some forward thinkingabout the end environment during development of processor driven blocktests will ease down stream test reuse.

Synchronizing HDL and processordriven testbenches
In the image processor example, the external interfaces to the blockunder test were easily implemented in hardware. Clearly there aresituations where this is not the case and an HDL testbench must be usedin conjunction with a processor driven test to replicate the externalenvironment of the block. For this reason it's critical to be able tosynchronize execution of an HDL testbench with code executing on thefull functional processor model.

Synchronization is easily accomplished with existing hardware orsoftware methods. Logic simulator breakpoints can be used to hold offthe HDL testbench until the processor has finished loading registers ormemories. Message passing is implemented by having the processor poll aspecified register or memory location and waiting for the HDL testbenchto write the “proceed” value to that location. These can be memorylocations actually implemented in the design or be unused memorydedicated to test synchronization.

Modes of operation
Earlier we introduced the three distinct methods of developing andexecuting processor driven tests. Here we discuss each of them in moredetail.

A full functional model replicates the operation of the processor so it can be viewed as avirtual, albeit slower, live target. The same compiler and linker areused to generate object code for the live target and the fullfunctional model.

Because the full instruction-set and operating modes are covered,the full functional model can run any code destine for the physicaldevice. A typical exception is JTAG and other dedicated debug hardware.Since the model will run in the logic simulator, which has its owndebug environment, features intended for debug of a live target aretypically excluded.

Typical operation for an embedded processor is to come out of reset,issue a memory read to retrieve a reset vector then branch to thevectored memory location and begin executing code. To support theprocessor's native operating mode the user must load memory with thereset vector and the object code to be executed. Transforming theobject file produced by the compiler into a loadable memory image istypically an exercise left to the user.

It's worth noting that although the goal of the processor driventest is to verify operation of a custom design block or IP, enough ofthe memory sub-system must be modeled in the logic simulator to supportthe processors desire to fetch code from memory and its need to makedata references to establish local data store, software variables, andthe processor stack. While the interface between the processor andmemory must ultimately be verified, modeling the memory sub-system istangential to performing a block test.

Full function processor models come in many forms. Most aredeveloped in C rather than RTL and all make some tradeoff with respectto accuracy and performance. While faster is usually better it'simportant to consider the speed of the complete environment. A complexdesign described in RTL will simulate at a rate measured in 10s ofclocks per second while processor models can reach speeds of 100,000 to1 million instructions per second.

Given these two entities will be simulating in tandem, a point ofdiminishing return on processor speed is reached where a faster modelsimply spends more time waiting for the logic simulator to evaluate theRTL. A good trade-off is a cycle accurate CPU model which accuratelytracks the function of the pipeline and runs at ~100K instructions/sec.

Usability is also a key factor when choosing a processor model. Doesit support a source level debugger? How easy is it to integrate? What'sthe process for loading object code?

The bus functional model , incontrast to processor driven tests using a full functional model,supports the more traditional testbench. The user specifies the desiredbus cycle type, the address to be accesses and the data to be writtenor expected.

The bus functional model relieves the user of the need to study andmodel the complex pin sequences and timing that define modern processorbuses such as AMBA, AXI and OCP.Typically C is used to define the sequence of bus cycles to be appliedto the device under test. While this is the same language use forprocessor driven tests using full functional models, the composition ofthe two types of tests is quite different.

A bus functional test is compiled to run native on the host computerwhile a PDT for a full function model is compiled to target and run onthe embedded processor. The former consist of a sequence of bus cyclesand the latter is embedded code. A short example of bus-functional testcode is presented in Figure 3, below .

Bus functional tests are inherently easier to setup. There is noneed to establish a memory sub-system, convert the object file into amemory load image, or initialize the processor as is the case with afull functional test.

Figure3: code segment from a bus-functional test

Constrainedrandom. A parameterized C routine interfaced to a bus functionalmodel can generate constrained random bus cycles to enhance functionalverification. While the complex sequential nature of most bus mastersand slaves presents a barrier to randomized data access, the ability toload a bus with metered quantities of traffic can prove valuable whentesting the ability of other bus masters and slaves to communicate overa loaded bus.

Figure 4 below depicts ascript to configure a combined constrained random generator and AMBAbus functional model. In this example the user can specify how heavilythe bus is to be loaded, what address ranges to load and how the loadis divided across them, what bus cycle types to issue and the ratio ofthose cycles. With a small amount of work the user can generate volumesof tightly constrained bus traffic

Figure4: Script to drive constrained random AMBA bus cycles

Portability of Processor DrivenTests
An attractive characteristic of PDT is its potential for reuse acrosstwo dimensions of the verification landscape. On one dimension, mostblock-level tests that are processor driven can be applied at thesub-system and SoC level with little or no modification. This benefitis owed to the concept that the interface between the block under testand the processor is likely to remain consistent as the rest of thedesign is assembled around it.

If the addressable registers in the block hold their position in theaddress map, then the same PDT can be used to access them at block andSoC level. The test is immune to many of the design changes that canrender an HDL testbench inoperative.

Changes to pin or device names, access sequences and timing changesseldom impact a processor driven test. It's conceivable the hardwareinitialization function could be impacted by a change to the buscontrol or arbitration logic, but maintaining one copy of the functionwhich hundreds of tests call, insulates each test from changes inhardware initialization.

The second reuse dimension spans verification environments fromsimulation to live target. Since processor driven tests are written andcompiled as snippets of embedded code, they can run on any fullyfunctional representation of the processor be it a simulation model orthe physical device. Tests that transcend the virtual and physicaldomains can be used to solve the often difficult dilemma of discoveringa bug in the lab that can't be reproduced and debugged in the logicsimulator.

In addition to the processor bus, many blocks interface to thedesign through other means such as the ports to memory depicted in Figure 2, earlier . This supportinginterface hardware can be moved from the block through SoC level andfrom simulation to live target without modification.

In the case where an HDL test bench is used to stimulate anon-processor bus interface such as a USB port, it's unlikely the HDLtestbench can be used with a live target. In this case the USB portmust be physically interfaced to hardware, which may not behaveidentically to the HDL testbench, so changes to the processor driventest will need to occur to accommodate connection to a physical device.

Code sources for processor driventests
Block tests. Typical processor driven test suites include hundred to thousands ofsmall tests designed to validate the many functional blocks accessiblethrough the CPU bus. Each must start by calling a processorinitialization routine which sets up basic CPU functions like MMU,cache, interrupts and stack. Once initialization is complete theprocessor branches to “main” and begins executing the test itself.

Although these tests are compiled to target and execute on the fullfunctional processor model, most developers don't consider themfirmware per se. They are simply block tests written in C and appliedto the intended hardware by the CPU.

Hardwarediagnostics. Nearly every embedded system design project has ateam of firmware engineers who develop the hardware diagnostics. Theseare used to bring up first silicon in the lab, in manufacturing testand an abbreviated set is often shipped with the product as a self-testtool for the end customer. Hardware diagnostics make an excellentprocessor driven test for simulation. Because they are developed fromthe hardware specification they represent on orthogonal view of thedesign and often expose discrepancies or ambiguities in the hardwarespecification versus implementation.

Diagnostics are an excellent source of free tests to theverification engineer. Considering hardware diagnostics are the firstcode used to bring up first silicon or the hardware prototype, it makessense to run them during simulation to expose and correct problemsprior to tape-out. Simulation provides an early platform to develop anddebug the diagnostics so they are more robust when first siliconarrives.

Boot code. Another excellent source of free tests is the boot-ROM code. Think ofthis as an elaborate initialization routine that configures not onlythe processor but also much of the surrounding logic. Boot code mustcomplete successfully before device drivers, real-time operatingsystem, or application code can be executed. It is often associatedwith “getting a prompt” on a screen or display and its successfulexecution is considered a major milestone in bringing up new hardware.

The same arguments for simulating hardware diagnostics apply to bootcode. The project benefits from early checkout of this key piece ofcode and the hardware team adds a significant verification routine totheir suite of tests with minimal effort.

This concept can be carried further with the import of devicedrivers, real-time operating systems and application code. The highestverification benefit is derived from firmware that is highlyinteractive with the hardware. As you move up the software complexityscale toward application code the hardware verification valuediminishes and the simulation runtime increases dramatically.

Support of Processor Driven Tests
There are several limitations associated with conducting processordriven tests in a classic logic simulation environment. All of theissues center around the introduction of embedded code into a toolarchitected to model and simulate hardware.

Debugvisibility. The first and most obvious deficiency is lack ofvisibility into the C code executing on the full functional model.Hardware debug visibility is so critical that an entire cottageindustry has grown to supply debug tools for Verilog simulators. Yetthe user is mostly blind to the C or assembly driving a processordriven test, so debug must be done with logic viewing tools like awaveform window. Simulators support source-level debug of HDL or SystemC but they don't makeprovisions for embedded code.

The full functional processor models offered with Mentor's Questaverification tool have an integrated source-level debuggerthat mimics most of the features found in classic software debuggerssuch as GCC.

Included is the ability to view and step through source or assembly,set break points and examine the contents of registers, memory andvariables. These capabilities are key to isolating errors in either thedesign or the test itself. Without source-level debug the fullfunctional model appears for all practical purposes as a stimulusgenerator of unknown origin. The only clue to its behavior is a staticview of the C code driving the test.

Performance. As discussed earlier, the primary limitation to the effectiveness ofprocessor driven tests is the number and length of tests that can beexecuted in the time allowed. It is the nature of PDT that some datamanipulation is performed before bus cycles are issued to the hardwareregisters contained in custom logic or blocks of IP.

Since the goal of PDT is to verify hardware, code executionresulting in bus cycles between the full functional model and programmemory are of minimal value other than to prepare for the next cycledirected at hardware registers. With most embedded RISC processorsfully half of the processor cycles merely fetch instructions from cacheor program memory while most of the remainder are data references tomemory.

Some estimates place the software overhead of PDT at greater than90% leaving less than 10% of the bus cycles dedicated to meaningfulexchange with the hardware function to be verified.

So while processor driven tests mirror the native operation of thedesign and expose errors an HDL testbench can miss, they are actuallyquite inefficient when measured in hardware bus cycles per unit ofsimulation runtime.

Since software overhead consumes 90% of PDT execution and representslittle value to hardware verification, short of relentlessly testingthe same path to program memory, it's an excellent candidate foracceleration.

The approach we have taken in the verification tool offers easilyconfigured high-speed memory to support instruction fetches and datareferences.Processor cycles to this memory occur in zero simulation time resultingin the logic simulator seeing only the meaningful cycles to thehardware.

Using the 90% estimate for software overhead, the tool's high-speedmemory can boost throughput by about ten fold (10X). Actual performancefor a given PDT will vary depending on the ratio of instruction fetchand data references to that of hardware-register bus cycles.

Speeding the execution of software overhead permits as many as 10times more processor driven tests to be run in the same period of time.This can reduce the turn-around on regression suites or boostfunctional verification coverage by permitting more tests to be run.

A class of processor driven tests, which would take too long tosimulate using standard methods, is now practical to utilize. Booting areal-time operating system with its associated hardware adaptationlayer is a valuable consistency check between the hardware team andsoftware team's view of the design.

Automated codeloading . A side benefit of the approach we use for establishingand managing the high-speed memory for software access is the ease ofloading object files. After the C source for a processor driven test iscompiled to target the resulting object file must typically be massagedinto memory load images for the logic simulator.

Using the Questa methodology, this step is eliminated by loading theobject code directly into the region of high-speed memory. There's noneed to post process theoutput of the target compiler.

Processor driven tests attack functional verification of hardware froma unique angle. As such they can detect design errors that may bemissed by traditional HDL testbenches. Processor driven test can beeasily migrated from block to SoC level verification and across thesimulated and physical domains.

They exercise the design in a native fashion and can be applied toany function directly or indirectly accessible by the processor bus.They work in concert and can be synchronized with HDL testbenches.Processor driven test can be easily developed by the hardware designeror verification engineer. They can also be imported from the firmwareteam, opening a channel to a large volume of tests that the hardwaregroup did not have to develop themselves.

Processor driven tests require a full functional CPU model which canalso be leveraged as a bus functional model and bus traffic generator.If your designs contain or are interfaced to an embedded processor andyou're not already taking advantage of processor driven tests, youshould consider adding them to your hardware verification methodology.

Jim Kenney is a product marketingmanager in the System Level Engineering division of Mentor Graphics, where he hasbeen responsible for analog, digital and mixed-signal simulation andcurrently manages the Seamless hardware/software co-verification suiteof products.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.