HW/SW co-verification basics: Part 2 - Software-centric methods - Embedded.com

HW/SW co-verification basics: Part 2 – Software-centric methods


Most co-verification methods can be classified based on the execution engine used to run the hardware design. A secondary classification exists based on the method used to model the embedded system microprocessor. Generally, these methods fall into two categories, software centric and hardware centric. All have some pros and some cons. That is why there are so many of them, and it can be difficult to sort through the choices.

Native Compiling Software
Many software engineers prefer to work as much as possible in the host environment (on a PC or workstation) before moving to the embedded system in a lab setting. There are two ways to do software development and software simulation in the host environment.

The first is to use workstation tools to compile the embedded system software for the host processor (instead of the embedded processor) and execute it on the workstation. If the embedded system software is written in C or C++, host compiled simulation works very well for functional testing.

The embedded system software now becomes a program that runs on a PC or workstation and uses all of the compilers, debuggers, profilers, and other analysis tools available for writing workstation software. Workstation tools are more plentiful and higher quality since more programmers are making use of them (remember, the embedded system space is extremely fragmented). Errors like memory leaks and bad pointers are a joy to fix on the workstation when compared to the tools available on the target system in the lab.

Instruction Set Simulation
The second method to work in the host environment is to compile the embedded system software for the target processor using a cross compiler and simulate the software using an application called an instruction set simulator.

The ISS is a model of the target microprocessor at the instruction level. It has the ability to load programs compiled for the target instruction set, it contains a model of the registers, and it can decode and model all of the processor's instruction set.

Typically, this type of tool is accurate at the instruction level. It runs the given program in a sequential manner and does not model the instruction pipeline, superscalar execution. or any timing of the microprocessor at the hardware level in terms of a clock or digital logic.

For this reason a good, fast, functional simulation is provided, but detailed timing and performance estimation is not available. Most instruction set simulators come with an interface to one or more software debuggers. The same embedded software tool companies that provide debuggers and cross-compilers may also provide the instruction set simulators.

The ISS is also useful for testing compilers and debuggers without requiring a real processor on a working board. When a new processor is developed, compilers must be developed in parallel with silicon, and the ISS enables a compiler to be ready when the silicon is ready so software can be run immediately upon silicon availability.

Hardware Stubs
The major drawback of working on the host with native compiled code or the ISS is the lack of a model of the rest of the embedded system hardware. Much of the embedded system software is dependent on the hardware.

Software such as diagnostics and device drivers cannot be tested without a model of how the hardware will react. This hardware dependent software is usually the most important software during the crucial hardware and software integration phase of the project.

To combat this limitation, software engineers started using C code to implement simple behavioral models, or stubs, of how the target hardware is expected to behave. These stubs can provide the expected results for system peripherals and other system interfaces.

Some instruction set simulators also started to incorporate hardware stubs that could be included in the simulation by providing a C interface to the memory model of the ISS.

Peripherals such as timers, UARTs, and even Ethernet controllers can be included in the simulation. The number of hardware models needed to make the ISS useful will determine whether it is worth investing in creating C models of' the hardware.

Figure 6.9: ISS with Memory Model Interface
For a large system, it can be more work to create the stubs than creating the embedded system software itself. Figure 6.9 above shows a diagram of an ISS with a memory model interface that allows the user to add C code to take care of the memory accesses.

Figure 6.10 below shows a fragment of a simple stub model that returns the ID register of a CPU so the executing software does not get an error when it reads an expected ID code.

Figure 6.10: Code for a Simple Stub

Real-Time Operating System (RTOS) Simulator
For projects that use real time operating systems, it is possible to use a host-compiled version of the RTOS. Some commercial operating system vendors provide the host-compiled version that can he run on a workstation.

For custom or proprietary operating systems, the RTOS code can usually be “ported” to the host. The RTOS simulator is fast and most useful for higher levels of software. It can be used to test the calls to RTOS libraries for tasking, mailboxes, semaphores, and so forth.

The RTOS simulator is more abstract then the ISS. and usually runs at a higher speed. Since the software is compiled for the host machine, it does not allow the use of any assembly language.

Again, it suffers from the same limitation of the ISS since the custom hardware is not available. An example of an RTOS simulator is VxSim, a simulation of the popular RTOS VxWorks from Wind River. VxSim allows device drivers and applications to be tested in the host environment before moving to the embedded system. Drivers usually require hardware stubs to provide simulated responses.

Microprocessor Evaluation Board
Among software engineers, the most popular tool used for learning a processor and testing code before the target system is ready is the microprocessor evaluation board. This is a board with the target microprocessor and some memory that typically uses a network connection or a serial port to communicate with the host. It allows initial code to be developed, downloaded. and tested.

Target tools are used to debug and verify the code. Many software engineers prefer to use the evaluation board since the tools are the same as those that will be used when the system is ready and it is most like working with the true product being developed.

Every microprocessor vendor has an evaluation board for sale soon after the processor is available, usually at a very reasonable price. Vendors also provide sample code and even hardware schematics for the board.

Some embedded system designs even go so far as to copy the evaluation board and just add a small amount of custom hardware or even buy and use the evaluation board in a product without modification.

This is very tempting to get a hardware design quickly, but the boards are not usually designed for higher production volume products. Check the cost and the reliability of the design before directly using an evaluation board as part of a product.

If the embedded system contains a fair amount of custom hardware. the evaluation board is less useful. Depending on the amount and nature of the custom hardware, it may be possible to modify the evaluation board by including extra programmable logic or other semiconductor devices to make it look and act more like the target system design.

Waveforms, Log Files, and Disassembly
For SoC designs, many software engineers are forced to do early software verification with full-functional logic simulation models and waveforms in a hardware design environment. Those engineers skilled in both software development and hardware design may be able to debug this way, but it is not the most comfortable debugging environment for most software engineers. A source level debugger with C code is preferred to bus waveforms and large log files from a Verilog or VHDL simulator.

I once introduced co-verification to a project team working on a complex video chip with four ARM CPU cores. After preaching the benefits of co-verification and the ability to debug software using a source level debugger the software engineers shook their heads and seemed to understand.

Their current setup involved the use of the RTL code for the ARM cores running in a logic simulator. As part of this environment, they included a model that monitored the execution of the ARM cores and output a log file with the disassembly of the executing software as a way to track software progress. Since the tests ran very slow, they would wait patiently for simulation to complete and then get this log file and try to correlate it with the source code to see what happened.

When they went to start co-verification they immediately asked if the co-verification tool could output the same kind of log file so they could track execution after the test finished. Of course, it could, but this type of debugging does not really improve their situation. After some coaxing, they agreed to try interactive software debugging with a source-level debugger and were pleased to discover this type of debugging was possible.

Host-Code Mode with Logic Simulation
Host-code mode is a technique to compile the embedded system software, not for the embedded processor in the hardware design. but instead for the host workstation. This is also referred to as native compile. To perform co-verification the resulting executable is run on the host machine, and it connects to a logic simulator that executes the hardware design.

Some type of inter-process communication (IPC) is required to exchange information between the host-compiled embedded software and the logic simulator. The IPC implementation could be a socket that allows each of the two processes to be on different machines on the network or shared memory that runs both processes on the same machine.

Host-code mode is not limited to using a logic simulator as the hardware execution engine. Any hardware execution engine can be used. Some others that have been used with host-code mode are an accelerator/emulator and a prototyping platform.

With host-code mode, a bus functional model is used in the hardware execution engine to create bus transactions for the bus interface of the microprocessor. The combination of the host-compiled program plus the bus functional model serves as a microprocessor model. Host-code mode provides an attractive environment for both software and hardware engineers.

Software engineers can continue to use the software tools they are already using, including source code debuggers and other development and debug tools on the host. Hardware engineers can also use the tools they are already using as part of the design process: a Verilog or VHDL logic simulator and associated debug tools.

This requires a minimal methodology change for both groups of engineers and can benefit both software and hardware verification. The ability to do pre-silicon co-verification is a great benefit when the processor does not yet exist. Figure 6.11 below shows the basic architecture.

Figure 6.11: Host-Code Execution with Logic Simulation
Host-code mode can also be used when the software does not access the hardware design via a microprocessor bus, but instead via a generic bus interface like PCI. Many chips do not have an embedded microprocessor, but are designed with the PCI bus as a primary interface into the programmable registers.

In this case the software can be run on the host and read and write operations from the software can be translated into PCI bus transactions in the hardware execution engine. This is a good example of when it is useful to abstract the software execution to the host and link it to hardware execution at the PCI interface.

Host-code mode requires the embedded software to be modified to perform function calls when it accesses the hardware design through the bus functional model. This process of putting in specific function calls can either be a pain if a lot of embedded software already exists or be little or no problem if the code is being written from scratch and all memory accesses are coded to go through a common function call. Examples of C library calls that are used for host-code execution are shown in Figure 6.12 below .

Figure 6.12: Host-Code Mode Example Function Calls
Inserting these C calls into the software is called explicit access because the user must explicitly put in the references to the hardware design. The other way to use host-code mode is to use implicit access.

Implicit access does not require the user to put in special calls, but automatically figures out when the software is accessing the hardware based on the load and store instructions being run. This technique will be covered in more detail in another chapter. but with implicit access, the user can use ordinary C code to access hardware via pointers as shown in Figure 6.13 below.

Figure 6.13: Example of Implicit Access
Host-code mode can also be used to integrate an RTOS simulator such as VxSim as discussed above. A diagram of host-code execution in the context of an RTOS simulator is shown in Figure 6.14 below.

Figure 6.14: RTOS Simulation and Host-Code Execution

Instruction Set Simulation with Logic Simulation
Another way to perform co-verification is to compile the embedded system software for the target processor and run it on an instruction set simulator. An ISS allows not only C code but also assembly language of the target processor to be run.

This allows more realistic simulation of things normally coded in assembly language such as initialization sequences, cache and MMU configuration and simulation, and exception handlers. This mode of operation is referred to as target-code mode.

As with host-code mode, some type of inter-process communication (IPC) is required to exchange information between the instruction set simulator and the logic simulator. Target-code mode is not limited to using a logic simulator as the hardware execution engine.

Any hardware execution engine can be used, but since the instruction set simulator will likely run slower than a host code program it is important to make sure the speed of the instruction set simulator is not too slow to see benefits from a hardware execution engine such as an accelerator.

The bus functional models used with an ISS are the same or similar to those used in host code mode. The main difference is that with an ISS it may be possible to understand the context of the bus transactions better. In host code mode, only a single bus transaction is considered at a time.

On a bus that supports address pipelining, such as AHB, there is no way to determine the next bus cycle that will be done by the host code program, so only a single transaction would be simulated and there is no pipelining.

The ISS can utilize knowledge of what will be the next bus transaction to occur and can supply the bus functional model with the next address so that it can model the address pipelining correctly. This is a major benefit of using a good ISS for co-verification. Target-code mode also enables instruction fetches to be verified.

Like host-code mode, software engineers can debug code in a familiar environment. In target-code mode. the debugger is not a host debugger, but rather a debugger that can work with the ISS and debug programs cross-compiled for the embedded processor. Figure 6.15 below shows the architecture.

Figure 6.15: Instruction Set Simulator Connected to Logic Simulation
To integrate an ISS with a bus functional model, the memory interface to the ISS must be modified to run logic simulation to satisfy the memory accesses. Instruction set simulators as used by software engineers normally have a flat memory model that is a simple C model allowing the program to be loaded and run.

Some instruction set simulators have the ability to customize this memory model so the users can add their C models (stubs) to provide some rudimentary model of the hardware. Without at least the ability to put in stub models, most embedded system code will not run on a flat memory model since it deals with memory-mapped hardware registers that should have nonzero values after reset.

Doing co-verification with an ISS is really just a simple extension of the use of stubs to instead turn memory transactions into calls to the logic simulator for execution on the bus functional model.

The other thing that must be reported to the ISS is interrupts. When an interrupt occurs on the bus, the ISS must know about it so it can model the exception processing and start the service routine.

Most commercial co-verification tools provide many more features that just gluing the memory model of the ISS to a bus functional model and reporting interrupts. But this description is easy to understand and has been used by many users to construct their own co-verification environment using an ISS.

Some instruction set simulators keep statistics and account for the simulation cycles that have been used to satisfy memory requests. This allows useful features such as performance estimation and profiling to be used to find out details of software execution.

In the simple ISS integration description above, the read and write activity would have to report a number of bus clocks that were consumed to satisfy the transaction. The ISS may be able to use this clock cycle count and update its internal notion of time.Unfortunately, this is not always easy to do since the time domain of the ISS is now out-of-step with that of the logic simulator.

Synchronization between the software execution environment and the hardware execution environment are types of issues that have led to the shift from a transaction-based interface to one that is cycle based.One way to think of a cycle-based ISS is to say that it exchanges pin values between the ISS and logic simulator on every bus clock cycle. This is equivalent to moving the bus functional model state machine into the ISS and just applying the signal values in logic simulation.

Figure 6.16: Cycle-based Instruction Set Simulator Connected to Logic Simulation
Another way to view it is as a transaction-based interface where the logic simulator has the ability to report wait states to the ISS and the ISS will return with the same memory transaction until it completes.This approach is better suited for cases where better accuracy is needed. It is also better suited for multiprocessor designs since it can keep all processors synchronized with the logic simulator on a cycle-by-cycle basis. Figure 6.16 above shows the architecture of a cycle-based instruction set simulator.

C simulation
The logic simulation and acceleration techniques discussed so far evolved from the hardware Simulation domain. One complaint about co-verification developed by extending the hardware simulation platform to include software engineers includes limited availability of the platform.

For example. to perform co-verification using logic simulation requires a logic simulation license for each software engineer that is running and debugging software. Most companies purchase logic simulation licenses based on the demand for hardware verification and don't have extras available for the purposes of co-verification.

Similarly, higher performance hardware execution engines such as simulation acceleration and emulation are even more difficult to acquire for software development. Most companies have only one or two such machines that must be shared by verification engineers and software engineers.

This limited scalability often leaves engineers wondering if there is a way to do co-verification that doesn't require traditional logic simulation. The natural conclusion is to think about using a C or C++ simulation environment to eliminate the need for logic simulation. At the same time, there is a perception that C simulation is faster than Verilog and VHDL Simulation.

SystemC is one such environment that is gaining momentum as a modeling language that can provide C++ simulation of the design without requiring logic simulation, and at the same time can also co-simulate with an HDL simulator when needed.

SystemC by itself is not a co-verification method, but rather an alternative hardware execution environment or even an alternative modeling language to be used instead of Verilog and VHDL. Model-based methods require a library of models to be created, and missing models are a common source of difficulty.

The question with any C simulation environment – SystemC or homegrown – has always been the development of the design model. Like the primitive hardware stub methods used by software engineers, somebody must create the simulation model of the hardware design.

Since this model creation is not yet a mainstream path to design implementation, any work to create an alternative model that is not in the critical path of design implementation is usually a lower priority that may never become reality.

Contrast this to logic simulation where RTL code for the design must be developed for implementation so using this RTL code in a logic simulator is always a model that is readily available.

Tools are now available to take the Verilog and VHDL code for the design and turn it into a C model by translating it into C or SystemC or even directly to an executable program that is not a traditional logic simulator.

Of course, such tools must do more than just eliminate the need for the logic simulator license; they also must offer some performance gain to satisfy the perception that somehow C should be faster than Verilog or VHDL: a tough job considering the optimization already being done by today's logic simulators.

By doing nothing more than eliminating the logic simulator license the price would have to be dramatically lower than that of a simulator to be compelling, which is very difficult since the simulation market is mature and prices will only come down as time progresses.

The approach of these Verilog to C translators is to turn the Verilog into a cycle-based simulation by eliminating timing. Cycle-based simulation has never been a mainstream methodology, so it is not clear that converting Verilog code into a cycle-based executable will succeed. Only time will tell.

A common post on newsgroups related to Verilog simulation is from the engineer looking for the Verilog to C translator. There are many of them, and a couple of them are shown in Figure 6.17 below . The answer usually comes back that the best Verilog to C translator is the VCS logic simulator.

Most engineers asking for the translator are not clear on how it would benefit them. In fact, many of the products mentioned are no longer available as commercial products.

Figure 6.17: Verilog-to-C Translator Requests
The only real way to gain higher performance from C or SystemC simulation is to raise the abstraction level of the model. Instead of modeling the design at RTL, more abstract models must be developed that eliminate the detail of the model and as a result enable it to run faster.

The theory on high-level modeling is that an engineer can make an abstract model in about 1/10 the time it takes to develop an RTL model and the model should run 100 to 1000 times faster in a C or SystemC environment.

Engineers are looking for a minimum of 100 kHz performance, and 1 MHz is more desirable. Some tools translating HDL into C are starting to show about IN performance speedup over logic simulation by eliminating some of the detailed timing of logic simulation without requiring the user to make any changes to the RTL code. Raising the level of abstraction holds promise for running software before the RTL for the hardware design is available.

Co-verification utilizing a C simulation environment is very much the same as with traditional logic simulators. Instruction set simulators and host code execution methods can be used to run the embedded system software and perform software debug.

The compelling reason to look into co-verification based on C simulation is the ability to scale co-verification to many software engineers. Once a C model of the design is in place and co-verification is available, then every software engineer can use it by simply making copies of the software model.

This also makes it possible to give the model and environment to software engineers that are outside the company to start developing software and doing such tasks as porting an RTOS without waiting for hardware and without the need to use logic simulation.

I have never confirmed it, but I can guess that software companies such as Wind River have a need to port vxWorks to new processors and custom hardware designs before chips and boards are available. I can also guess they don't have a Verilog simulator and even if they could get a simulator they probably don't want to learn how to use it.

Companies that started out developing co-verification tools that allow users to create their own C models and combine them with microprocessor models and debugging tools to form a representation of the design face a difficult modeling dilemma about who will create the models.

To enable wider use of the technology and go beyond focusing on the creation of models for custom designs, some products shifted toward the use of a C model as a replacement for the common tool that all software engineers know and love, the evaluation board.

The all-software virtual evaluation board is an alternative to buying hardware, cables, power supplies, and JTAG (joint test action group) tools. When many engineers need access to the board, it becomes much more cost effective to deploy a software version of it.

In addition to basic microprocessor evaluation boards, C models can be created for reference designs and platforms that are often used as starting points for adding custom hardware. This type of virtual board enables debugging that is not possible on a real piece of hardware.

Value is derived from being able to monitor hardware states and have easy access to performance information. By constraining support to off-the-shelf boards it is easier to serve the market, but does not address custom designs. Model based methods always seem to face model availability questions.

Co-verification revolving around C simulation is an interesting area that will continue to evolve as engineers start to look at top down design methodology that could leverage such a model for high-speed simulation and also use it for the design implementation.

Next in Part 3: Hardware centric co-verification
To read Part 1 , go to “ Determining what and and how to verify .

This series of articles by Jason Andrews is from “Embedded Software know it all” edited by Jack Ganssle, used with permission from Newnes, a division of Elsevier. Copyright 2008. For more information about this title and other similar books, please visit www.elsevierdirect.com.

Jason Andrews, author of Co-verification of Hardware and Software ARM SoC Design, has implemented multiple commercial co-verification tools as well as many custom co-verification solutions. His experience in the EDA and embedded marketplace includes software development and product management at Verisity, Axis Systems, Simpod, Summit Design. and Simulation Technologies. He has presented technical papers and tutorials at the Embedded Systems Conference, Communication Design Conference, and IP/SoC and written numerous articles related to HW/SW co-verification and design verification. He has a B.S. in electrical engineering from The Citadel, Charleston, S.C., and an M.S. in electrical engineering from the University of Minnesota.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.