Using emulation to debug software and hardware at the same time
Debugging a kernel panicOn a system running Linux, the fact that the OS has to boot before overlying software can be run means that Linux itself has to be up and working properly before you can start debugging your own software. But, in fact, any embedded system that requires Linux also requires work to get Linux to work properly. A successful boot is a battle won. And subtle issues with the hardware can impact whether or not the system will come up properly.
An example of such a problem can illustrate how the interplay between hardware and software is critical in the debugging of odd problems. In this example, Linux fails to boot – it “panics” or issues an internal error, and we need to figure out why.
The hardware platform (called the design-under-test, or DUT) in this case consists of a Diamond DC-232L processor core with 16 MB of ROM and 128 MB of RAM. The system is equipped with a UART and the ability to drive an LCD. For debug purposes, it also has a JTAG test access port (TAP).
The system runs Linux from MontaVista, version 4.0.1, implementing kernel 2.6. Initramfs is used for the initial RAM disk, and most of the shell utilities are handled using Busybox 1.5.
The RTL describing this system is loaded into a ZeBu emulator and then connected to a host system. Such emulators communicate with a host via transactions so that commands and interactions with the host don’t slow down emulation speed. For this example, there are three transactors: one for the UART, one for the LCD, and one for the JTAG TAP.
On the host side, we run a Linux console from the UART transactor, the LCD display from the LCD transactor, and a software debugger that connects through the JTAG transactor. For hardware debugging, zRun is used. zRun has a software symbol awareness feature that lets you work with the hardware side using the language of the software side. The entire system is shown in Figure 1 below.

The processor is emulated at 12.5 MHz, and it takes a couple of seconds to download the design into the FPGAs on the emulator. Once execution starts, if everything boots correctly, it takes 70 seconds to get to the Linux prompt at this speed.
In this particular example, at some point, when the boot process is almost complete, something goes wrong, and the console points to an interrupt issue with irq4: “too much work” (Figure 2 below).
Click on image to enlarge.
The next step is to find the offending code in the kernel source. The kernel has a lot of code, but the error message appears to indicate the function name, serial8250, and a search for that turns up the routine in which the panic message is generated (Figure 3 below).
Click on image to enlarge.
The key player in this routine is the Interrupt Information Register, or IIR. It may contain clues as to what’s going on, but there’s a problem unique to the IIR: using a software debugger to read the IIR has the side-effect of clearing it. This is an “intrusive” observation in that it changes the state of the system, something to be avoided if possible.
The way to get around this issue is to use hardware debugging instead. With a hardware debugger, you’re simply looking at various portions of the hardware. To the hardware debugger, a register is a register, and this particular register can be viewed without its special software semantics. Said a different way, the hardware debug option is non-intrusive.
First we want to watch specifically what happens during this routine. Because the hardware debugger is aware of software symbols, you can set the trigger to break or start tracing when the program counter enters the routine, which is as simple as looking for serial8250_interrupt, the name of the function (Figure 4 below).
Click on image to enlarge.
We can then set up to capture all of the UART signals as well as other critical points like the IIRs and memory bus, as shown in Figure 5 below.
Click on image to enlarge.
We then continue execution to create and subsequently analyze the waveforms for the selected signals.
Examination of the waveforms shows that the UART controller isn’t communicating with the processor in the way it should be. Linux buffers characters intended for the UART until the UART controller signals that it’s ready to receive the characters by virtue of its own empty outgoing buffer.
In this case, the UART controller never generates that initial interrupt saying it’s ready for output, even though its buffer is starting out empty. It’s only later, when someone hits a key on the keyboard, that the interrupt is generated. By that time, Linux has overflowed its own character buffer, and that’s where things fall apart.
If the interrupt signal isn’t being generated, then we need to look into the hardware definition for clues (Figure 6 below).
Click on image to enlarge.
Here it’s clear that the defining event for generating the interrupt is completion of a transmission. In other words, it’s set up to send an interrupt when the buffer becomes empty. But the buffer is also empty when the system starts up, before anything has been transmitted. The problem is that this logic only triggers when the buffer becomes empty, not when it is empty.
What this illustrates is the fact that software issues may reveal subtle underlying hardware problems, the kinds of issues that, left undetected before tape-out, would ultimately require an expensive mask revision. The only way to uncover such devilry is by executing software before the hardware is created, and the only way to do that in a reasonable timeframe is by using emulation.
It’s situations like this that reinforce the fact that, when you’re creating a complex SoC – or even a simple one, as shown in this example - you need to run both software and hardware debug together in an emulator in order to verify that both the hardware and software components are operating correctly, and that the system is truly ready to be committed to silicon.
Donald Cramb is director of the Consulting Services Division of EVE-USA in San Jose, Calif., and is responsible for customer services, applications and design solutions to support specific customer requests. Previously, he was a partner at ArchSilc Design Automation, a company focused on system-level verification solutions. Earlier in his career, Cramb was director of Services at Quickturn where he helped expand its offerings into key target markets, including wireless, graphics, multi-media and networking. He went on to become vice president of Technical Services for three years at Virtio and held a similar position at Tharas Systems. After graduating with a bachelor’s degree in Electrical Engineering from the University of Edinburgh, Cramb spent 11 years employed by Philips in the United Kingdom and Silicon Valley. He began his career as a design engineer.


Loading comments... Write a comment