Hardware/Software Co-Verification with RTOS Application Code - Embedded.com

Hardware/Software Co-Verification with RTOS Application Code



Michael Bradley is a seamless technical marketing engineer for Mentor Graphics. He has background in using RTOS's, and has supported and developed co-simulation tools between various hardware simulators and accelerators. He received a B.S.E.E. from Rensselaer Polytechnic Institute.

Kainian Xie is a senior software developer for HyperChip in Montreal, QC. He received the B.S. and Ph.D. degrees from Xi'an Jiaotong University (Shaanxi, China) in 1992 and 1997, respectively, majoring in automatic control.

Hardware/Software Co-Verification is typically performed at a low level of abstraction, using an Instruction Set Simulation (ISS) model of a CPU in conjunction with a Verilog or VHDL model of the rest of the design. This article describes a higher level of software abstraction. The CPU subsystem will be replaced by an RTOS simulator and application code written to the Application Programmers Interface (API) of the RTOS. Verilog or VHDL is still used to model the rest of the design.


Software programmers have a few tools and methodologies to develop and debug embedded software. A standalone ISS can run compiled code locally on a host workstation or PC. Designers must stub out device drivers and other routines that interact with the hardware, or emulate the hardware within a debugger macro language. Two disadvantages of this approach are the limitations of the macro language, and the accuracy of the implementation of the macros. An evaluation board that contains the target CPU is often used, offering the advantage of real-time performance. The disadvantage of an evaluation board is that its hardware resources are general purpose and bear little or no resemblance to the final product. You can create an FPGA prototype to mimic the hardware to be deployed, but this is a complex undertaking, especially for designs that consume multiple FPGAs.

One solution to accurate hardware/software verification is to use the ISS of the target CPU and “connect” it to the hardware simulator that the hardware design group uses. One obvious disadvantage of this technique is that software execution is limited to the speed of the hardware simulator. The Seamless Co-Verification package from Mentor Graphics increases the speed of the ISS-Hardware Simulator connection by allowing most of the ISS instruction cycles to run decoupled from the hardware simulator. This technology, termed optimizations, has been used to generate successful Silicon-on-a-Chip (SoC) tape-outs, as well as CPU-based board designs.

Another tool available to the programmer is an RTOS simulator, which does not emulate the instruction set of a CPU but, instead, models the resources of the RTOS itself. This allows the programmer to develop and debug task-level operations such as pending and posting to a mutex, rescheduling of tasks, and mailbox operations. The RTOS simulator is a higher level of abstraction than an ISS—it is CPU independent and does not require (or allow) assembly code.

It is possible to connect an RTOS simulator to the hardware simulator through the Seamless co-verification tool. At this level of abstraction, it is possible to observe the threads of execution and how these threads interact with the hardware. The effect is the appearance that thousands of software cycles have run in conjunction with the hardware in essentially zero time. In other words, the RTOS can be initialized, application tasks started, and the software ready to interact with the hardware before the hardware simulator has advanced. Once in this state, the hardware will be initialized by the RTOS application, and hardware interaction begins. The software can now perform system-level transactions with the hardware. This test environment is not concerned with CPU instructions—it will be used to exercise high-level operations in hardware and software. Test-environment performance will be bounded by the amount of hardware simulator time needed to perform a given software or testbench request.

The Line-Card Design

Hyperchip has developed optical-communication line cards. The cards plug in to a Hyperchip proprietary switch fabric. The switch fabric is structured to be highly parallel, which eliminates serial bottlenecks. The entire system is targeted for the core routing of optical networks, at a total speed of 1 petabits per second (a petabit is is 250 , which is roughly a thousand terabits, or 1015 ).

The forwarding and traffic management engine along with support functions are implemented in several FPGAs. A CPU is connected to the datapath hardware via a PCI bus interface. The CPU runs the VxWorks RTOS from WindRiver. WindRiver also provides VxSim as an RTOS simulator for VxWorks.

In the deployed line card, VxWorks runs on the CPU's core. VxWorks' memory space is the local SDRAM to the CPU. The PCI block within the CPU acts as a bridge and allows the core to communicate with the datapath hardware. Datapath hardware is able to communicate to the RTOS by depositing traffic information to SDRAM, and sending a PCI interrupt to the CPU.

Figure 1:  Line card with CPU sub-system

Figure 1 shows the major blocks in the line card. For the hardware/software verification environment, the hardware and software processes must communicate through some interface logic in the hardware simulator. This hardware/software interface is typically the pins of a CPU core or chip. However, in the line-card design, we are able to obtain a higher level of abstraction by interfacing at the PCI bus. To accomplish this, Seamless provides a PCI 2.1-compliant transactor model. This model converts I/O reads and writes within VxSim into PCI bus transactions in hardware. The PCI transactor also provides an interrupt facility from the hardware to VxSim.

In the simulation environment, the CPU and SDRAM are abstracted. VxSim will replace VxWorks running on the processor. VxSim is a simulated version of VxWorks, and runs on the workstation CPU. The workstation memory will replace the SDRAM. The Seamless PCI transactor model acts as the PCI bridge located in the CPU. Seamless implements the requested bus transactions from VxSim in the PCI transactor model. The PCI transactor model is instantiated in the VHDL design.

VxSim is integrated with Seamless via the HCE (Host Code Execution) mode. HCE is a special mode of Seamless that is activated when an ISS is not present. HCE mode allows the user to execute C code that references an HCE library, and is compiled for the workstation. The HCE library interfaces to a Bus Interface Model (BIM) in the hardware simulator. In other words, the HCE library allows the user's C-Code to interact with the hardware simulator. The HCE library has four major functions:

  • Advances time in the hardware simulator.
  • Initiates PCI bus-master transactions.
  • Creates a callback to accept and/or present data when the transactor is accessed as a target.
  • Creates a callback to process PCI interrupts.

The PCI library is an extension of the HCE library used to configure the PCI transactor in various PCI modes (such as, 32-bit vs. 64-bit), as well as define its configuration registers as a PCI target (such as, Vendor ID). Figure 2 shows the Seamless-enabled simulation environment.

Figure 2:  Co-verification environment

The line-card software is designed so that the higher levels of software are independent of the underlying hardware platform or simulation environment. You can port the line-card code to different platforms by altering the hardware-abstraction layer. We created several abstraction-layer versions in order to support different environments: the CPU evaluation board, Seamless/VxSim environment, and final hardware.

The deployed hardware system will boot from FLASH, which will copy VxWorks to SDRAM, where VxWorks is initialized and started. In the Seamless/VxSim environment, the booting operation is not needed; execution begins in VxSim and the user's startup routine. The startup routine calls hardware initialization routines and starts the user's tasks. These tasks run various tests on the line card. A typical startup sequence is:

  • Initialize Seamless PCI transactor.
  • Search for PCI targets on the PCI bus. Configure targets as needed.
  • Register PCI targets as IO devices in the VxWorks IO sub-system.
  • Start tasks to run tests.
Synchronizing VxSim and the Hardware Simulator

At first glance, one may assume that synchronization of the VxSim process running on the host with the hardware simulator is going to be a complex issue. In reality, this synchronization is not much different from typical synchronization issues between hardware and software. Hardware and software typically run asynchronous to each other. Methods such as polling and interrupts allow the hardware and software state machines to “sync up” and exchange information. Similarly, when user application code implements polling or interrupt methods, this will synchronize the hardware simulator and VxSim via the Seamless kernel.

Just in case the user does not have polling or interrupt driven software, or if the user needs additional control over synchronization, the HCE library provides additional facility to control synchronization. The HCE function, hce_AdvanceHardware() , tells the hardware simulator to advance simulation. Since the line-card software is interrupt driven, it was not necessary to precisely control the hardware advance time. It is more convenient to let the hardware advance function run periodically as a VxSim background task. Accordingly, the hce_AdvanceHardware() function is put in its own task and run at VxSim's highest task priority. This task also suspends itself, in order that the other tasks may run:

int hw_ready_to_go = 0;void advanceHw() {       while (1)        {              if (hw_ready_to_go)  		     hce_AdvanceHardware(100);              taskDelay(100);       }}

Figure 3:  Hardware simulator time advance

With the code in Figure 3 , the hardware will advance by 100 PCI clocks and then VxSim will run for 100 ticks. The global variable hw_ready_to_go delays the start of the hardware simulator until VxSim has completed its initialization (in some of the tests, a VxSim task is run to accept user input. The user is allowed to input the test(s) he wishes to run). The hw_ready_to_go variable is set at the end of the software and hardware initialization tasks, and when a test is ready to run. You don't have to call the hce_AdvanceHardware() function during hardware initialization because Seamless will automatically advance hardware simulation time for PCI bus transactions initiated by the RTOS.

Since the RTOS is running on the host, we cannot install interrupt service routines as we normally would in the deployed system. Instead, Seamless provides an HCE callback routine that is called whenever a PCI interrupt occurs. An argument is passed to the callback that indicates the cause of the interrupt. The Interrupt type includes all possible PCI interrupt types as well as an additional type which indicates that the PCI transactor model has been accessed as a target (slave). Some skeleton code for the interrupt callback is shown in Figure 4 .

In the example of Figure 4 , there are software tasks polling global variables such as isrFlagA and isrFlagB . The tasks periodically sample these global variables and execute appropriate interrupt code if they are set. Alternatively, you can use a mutex to activate the interrupt handler, or code placed here directly, as in the case of INTERRUPT_TARGET .

If the interrupt is of type INTERRUPT_TARGET , this indicates that the PCI transactor is being accessed as a target. In the case of the line card, the datapath hardware would only access the PCI bus to transfer packets of data into or out of memory. First an HCE function is called to determine the type of transaction (read or write), then an additional HCE function is used to receive or send data for the appropriate address.

Software/Hardware Testing

The loop-back test is an exciting software application to run with hardware/software co-verification. The software application code generates traffic such as IPV4, IPV6, and MPLS and injects these packets into the inbound network processor via the PCI bus and the support FPGA. These packets go through the datapath hardware to a physical loop back in the switch-fabric testbench. The packets then return through the datapath hardware and are deposited to a buffer in the RTOS. When the buffer is full, an interrupt is generated by the support FPGA. The interrupt is handled, and the driver delivers the packets to the RTOS task (Figure 5 ).

Figure 5:  Loop-back test

In this environment at Hyperchip, the Software Developer initiates the testing by creating software applications within VxSim. Tasks are written to configure hardware registers and then read the status registers to ensure the hardware is in the proper state. Next, the tasks inject data packets into the driver. The Software Developer can then analyze the Modelsim waveforms to find out if the packet is really injected into the hardware-simulation environment. After this task, the Hardware engineer can trace the packets through the datapath hardware. If everything runs correctly, and the packet did loop back and did go into software via PCI, the software Developer will make sure the return packet data is correct.

During the loop-back test, many bugs were easily caught because of the visualization of the software algorithm and the hardware implementation. Breakpoints where used on both sides to stop the simulation in order to analyze the state of the system.


The simulation environment of Seamless with VxSim, Modelsim, and the PCI transactor model proved to be a very effective environment in which to perform system-level simulations. It was possible to exercise the hardware in the same manner as it is exercised in a deployed system. This would not be possible with typical testbench tools. We debugged application-level software against a virtual prototype of the hardware. This saved valuable time, since we did not need a physical prototype. The software was developed ahead of schedule, before physical samples were available.

Several hardware-design issues where revealed during co-verification. Since the hardware was at an RTL level, changes were easy to make. It would not have been possible to execute such a large amount of software in a typical hardware-simulation environment. Seamless allowed the software to run on the host workstation, and to periodically synchronize with the hardware simulator. For the loop-back test, the execution time of the Seamless simulation was essentially the same as the time needed for the packets to traverse through the datapath in the hardware simulator. Throughout the entire process, it was possible to control and observe results in both the hardware and software environments. This capability provided insight into line-card operation that is not possible with other tools.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.