Simulating Embedded Systems
by Kenneth F. Greenberg
Although a simulator won't
solve all of your debugging problems, sometimes it's useful to have one around. Here's what simulation will and won't do for you.
I can't tell you how many times I've heard the following statement
from embedded systems developers: "Simulation? No, we don't use it
because it's too slow." While simulation may not be the fastest tool
available to developers, there are some things it does better than
any other environment. All it takes is one serious bug caught with a
simulator, and you've
significantly reduced your time-to-market--a
pretty big advantage for your company to be trading off against the
extra few minutes a day it actually takes to run a simulation. In this
article, I'll show you where simulation fits in your arsenal of
debugging tools, what simulators are really good at, and what kind
of capabilities to look for when choosing a simulation tool.
My first experience with simulators involved the design of an in-
circuit emulator--ironically, one of the best alternatives to
simulators. We were trying to get the emulator working in time for a
trade show, where we knew our biggest competitors would have
their latest offerings on display. Our product consisted of half a
dozen new boards and all the software it would take to run the
system. This included not only a new user interface, but very
extensive software to drive the hardware in the system. Of course,
the software and hardware had to be developed in parallel, and the
prototypes of the hardware were scheduled to
arrive just in time for
system integration, which meant that all our software had to be not
only written but tested as much as possible on the day that the
hardware arrived. Our solution was to write a simulator to the
hardware specification (we were lucky enough to have a good one)
and make the software work with the simulator. If we got the
simulator right, and if the hardware agreed with the specification,
we knew we could deliver the software on time. After six months of
hardware and software design,
we got through system integration in
a week with no major problems. The system worked at the show, our
competitors and customers were impressed, and I was sold on
simulators.
The kind of simulator previously described models a particular
hardware subsystem. The only way to obtain such a simulator is to
create it yourself. Yet, there are a number of off-the-shelf
simulators that can go a long way toward solving debugging
problems. While those simulators can't model your proprietary
design,
they can model the CPU you use and its associated memory
system. Some of these are available at little or no cost, and others
are part of a commercially available complete development toolkit.
These tools are particularly useful during certain phases of a
software development cycle.
EXECUTION ENVIRONMENTS AND THE SOFTWARE LIFE CYCLE
When you are developing software for an embedded system, you
generally go through a series of phases that make up the
development or lifecycle of the
software. First, you acquire the
requirements for what you intend to build. Then, you come up with an
appropriate high-level structure for solving the problem at hand.
Once the design is sufficient to begin coding, you can have software
team members start on different parts of the code. For example, if
the software is mathematically intensive, you may have someone
working on algorithms such as curve fitting routines. Someone else
may be prototyping a user interface so you can get feedback from
potential
customers. Other team members may start work on low-
level, hardware-intensive drivers. Some of this work can be done in
an environment that is quite different from the final product. Other
tasks will require much closer integration with the system
hardware. Eventually, hardware and software will come together
during system integration. Once the system is working, it must be
tuned for the desired level of performance. Then, it must be
validated (usually against a test suite) to ensure it will perform
reliably under boundary conditions encountered in the field. Finally,
the product is released, and the software enters the post-
deployment maintenance phase.
Not everyone follows exactly this series of steps, but they are
typical of many embedded system development projects. If your
development follows some or all of these steps, what tools would
you use for testing and debugging code at the various points in the
cycle? Typically, you have four choices for the execution
environment where the
software runs during debugging. First, you
can use the development host computer to run the code. Second, you
can use a simulator debugger. Third, you can use an in-circuit
emulator, ROM emulator, or logic analyzer. Finally, you can use a
debug monitor running in the target system and communicate with a
debugger running on the development host. Which is best? Despite
the subject of this article, the right answer is probably all of the
above, depending on what part of the software you are working on.
Table 1 shows a matrix of the development phase versus the
execution environment.
If you are concentrating on algorithmic development, your best
choice might well be native (development host) execution. I worked
on scientific instruments for many years, and my team always did
all the mathematical algorithms as native code on our development
systems. We were able to create test harnesses for new functions,
see what they did in excruciating detail, and then package the
functions for inclusion
in the product. This procedure provided
immediate feedback, allowed for rapid prototyping, and cost us
nothing in the way of buying new tools.
Eventually, you will start running into problems that can't be solved
in a native environment. If you are developing device drivers for,
say, a Motorola 68000 and your development host is a RISC-based
Unix workstation, you are not going to run any native code on your
host. If you have working target hardware, you may be able to use
hardware
instrumentation to get through this phase of development.
An in-circuit emulator gives you much more control over your
debugging process. Emulators are fast and allow you to access not
only your CPU and memory but all the other devices in your system.
However, if your target hardware is still under development, you
may not be able to use an instrumentation solution. Simulation may
be a better choice. If you have a large development team, you may
not be able to get enough time on the emulator or target system.
Emulators are expensive, and working prototypes are always a
scarce resource. There never seem to be enough of them around for
everyone who has a bug to isolate.
When you reach the performance tuning and quality assurance
phases, you have different needs. Many modern high-end processors
contain on-chip cache. The arrangement of code and data in memory
can have a significant effect on cache hit ratios. You can tune a
system for this type of processor if you can collect data on cache
hit
ratios. Some hardware instrumentation may be able to help, but
simulation probably does a better job--providing you have a
simulator that accurately models cache. Data collection also comes
into play when you are validating your software against a QA test
suite. Is your test suite good enough? Does it exercise all your code?
You need a way to find out which lines of code were actually
executed when the test suite was run. Some execution environments,
such as in-circuit monitors, generally lack the
facilities to collect
such data. Emulators and simulators are better at this job.
Finally, you will need an appropriate execution environment for
debugging once the system is deployed--the situation where target
monitors really shine. If your system can support the required
memory, you can leave the monitor code in your system. Then, when a
problem occurs in the field, you can go on-site with a notebook
computer (or perhaps use a modem), wake up the monitor, and find
out just what's going on inside
the system. Monitors are relatively
inexpensive compared to a hardware instrument like an emulator,
and they also provide access to all the devices in your target
system. Monitors have a few drawbacks, of course--their data
collection capabilities are limited, and they won't work until your
system is just about completely functional.
UNIQUE SIMULATION FEATURES
With all these choices, why are simulators so important? There are
some common problems that can best be solved with a
simulation
environment. There are also some jobs that may be impossible to do
without a simulator. At the very least, a simulator provides a good
cross check. In the last embedded system I worked on, my team had a
new processor, so no emulator was available. We did have high-end
logic analyzers, but not enough of them. When something went
wrong, our first reaction was always "Let's go back and try it on the
simulator." We could study a problem in great detail at our own
pace, gaining an understanding of
what our code was doing at the
point of failure. If we could reproduce the problem in the simulator,
we knew we were doing something wrong and we could correct it. If
not, there was a good chance that directing our efforts at finding a
hardware problem would give us the fastest fault isolation. Let's
look at some other areas where simulators excel for embedded
system debugging.
Nested exceptions
. Most embedded systems deal with exceptions in
one form or another. At the very least, you
probably have a few ISRs
(interrupt service routines) as part of your code. What happens when
you are in the middle of servicing an interrupt and a higher priority
interrupt occurs? In many systems, disabling interrupts inside ISRs
just isn't an option. You need to ensure that the state of the
processor is saved properly so that servicing of the lower priority
interrupt can resume once the high priority interrupt has been
serviced. On real hardware, it is almost impossible to set up test
conditions to
reproduce this problem. A high-end logic analyzer or
emulator could capture the information, but how do you make the
interrupts fire at the right time? In most simulators, this is trivial.
You can specify when you want simulated interrupts to occur. For
instance, you can run until the beginning of the ISR, step to the point
where you want the second interrupt to occur, and tell the simulator
to assert an interrupt immediately. You can then continue single
stepping through the code, observing what
happens to the processor
state, and ensuring that control gets properly transferred to the new
ISR. You probably only need to perform this function a few times, but
it is time well spent. Someday, this set of circumstances will occur
in the field. It's good to be prepared for it.
Stack usage
. Memory space is critical in most embedded systems.
You need to allocate stack space for your program. In a multitasking
environment, you need to allocate a stack for each task. How big
should stacks
be? If you specify stacks too small, you will
encounter overflows, and your system is likely to fail. If you specify
stacks too big, you have wasted space in the system. It's far better
to know exactly how deep your stacks get as the program runs. If you
have a simulator that monitors memory usage, you can find out
exactly how deep your stacks got at run time. Of course, it is
important to add enough space to the stack to handle exceptions that
may occur as the task or program was running. Still,
you need to
start somewhere in estimating stack size, and simulators can give
you a good idea of how much stack space you are really using.
Test suite validation
. It is common practice to write a test suite to
run against your code as part of a quality assurance program. One
way to determine the quality of the test suite is code coverage
analysis. Code coverage analysis gives you a measure of how much of
your code is exercised by its test suite. You can then go back and
look at the parts
of the code that were not exercised and add tests
as needed. For mission-critical software, you want to get as close
to 100% as possible. Otherwise, you will be running software in the
field that may never have been tested. In native environments, you
can use branch flow analysis to collect the data needed to determine
code coverage. For embedded systems, you probably won't be able to
run most of your code as native. Most instruments and debug
monitors can't do this data collection, but many
simulators can. This
situation is similar to the stack usage analysis described above. It
is easy for simulators to collect information on what code locations
have been read. This information can then be presented in terms of
lines of code executed or not executed during the simulation run.
PERFORMANCE ANALYSIS
At some point in your development cycle you will need to tune your
system to achieve the required performance levels. Simulators can
collect the data needed to analyze performance.
For instance, some
simulators can tell you which functions in your code get the most
CPU cycles, a feature that allows you to invest your optimization
time where it will do the most good. This feature can also point out
areas where a redesign can significantly improve throughput. For
processors with on-chip cache, realigning code and data can
dramatically affect performance. If you have a way to measure the
cache hit ratio, you can rearrange memory contents to minimize
cache misses and get the best
cache performance. Again, this
problem is one of data collection. A simulator with a good cache
model can monitor and report cache hits and misses.
Cache breakpoints
. Two types of breakpoints are helpful to have
when debugging an embedded system. Breakpoints can be either
instruction fetch breakpoints or data access breakpoints. The latter
cause the debugging session to stop when you read or write to a
particular memory location. Instruction fetch breakpoints may be
implemented by either
inserting a breakpoint opcode (these are
usually called software breakpoints) or monitoring accesses to a
code location (hardware breakpoints). Clearly, only the second type
can be used when running from ROM, because you can't change the
memory contents. Data access breakpoints can't be implemented
without some means to monitor access to the specified location. In-
circuit emulators are good at these hardware breakpoints, but they
don't deal well with cache. If you are only looking at the signals
brought out to the pins of the CPU, you can't see the accesses that
are entirely on-chip. While you see the memory reference when the
cache is filled, this isn't where you want to stop. Again, a simulator
with a good cache model makes this problem easy to solve. When you
set a breakpoint in the simulator, it can check to see if the break
location is currently in the cache. If so, it sets a breakpoint at both
locations (cache and corresponding simulated physical memory).
Then, you stop when the access
occurs in the application, just where
you want to look at it.
Annotated tracing
. Users of in-circuit emulators and logic analyzers
are familiar with the powerful trace features provided by these
instruments. Using these instruments is often the best way to
answer questions such as "How did I get here?" when an unexpected
condition occurs. But an instrument such as a logic analyzer is
limited to what it sees on the bus. If you are running from on-chip
cache, you don't always get the
information you need. Equally
important, today's RISC processors do almost everything in
registers. Bus cycles are not sufficient; you need to see the operands
of the instruction, even though they exist only inside the processor.
This situation is, of course, another where simulators are the ideal
tool to solve the problem. Some simulators provide an instruction
stream trace that looks much like the one produced by a logic
analyzer or in-circuit emulator. The simulator can annotate the
trace with the
contents of the registers involved in the operation,
giving you more information to isolate your problem.
SIMULATOR SHOPPING
Does the perfect simulator for your embedded system project exist?
Probably not. First, no one knows the peculiarities of the system you
are designing except your own design team. Unless youýve written
your own simulator, you'll have to limit your choices to the more
commonly available CPU/memory system simulators that are readily
available. These simulators go a
long way toward solving most of
the problems you'll run into. Save the tough hardware specific
problems for in-circuit emulators; the rest can be readily addressed
with an in-circuit monitor or similar solution. It's best not to try to
write the CPU/memory simulator yourself. I've designed and
implemented such simulators, and I've learned that accurately
modeling the behavior of even simple CPUs is much harder than it
sounds.
A hybrid solution would be the next best thing to the perfect
simulator. With a hybrid, you would use an off-the-shelf simulator
for the CPU and memory, and hook it up to a simulator that you write
yourself. The latter simulator would address the specifics of the
system you are designing. You could then tell the CPU simulator that
a particular address was memory mapped to the external simulator.
Every time you wrote data to that address, it would signal your
simulator that it was time to go into action. Similarly, when your
simulator had data available (say, the
output of a simulated A-to-D
converter), the data value could be read by the CPU simulator and
processed by your application software. Such hybrid simulators don't
seem to be available yet. One of the problems is that these hybrid
simulators require good inter-process communications facilities. A
great deal of development is still done on personal computers, which
don't provide good facilities for such communications. However,
more powerful PC operating systems are now available, and some of
these
will allow for the development of hybrid simulators if there is
enough demand for them.
In the meantime, it is common practice to map I/O data to the file
system. The file of input data must be prepared in advance. You can
look at the output file later to see what was written. Clearly, this
process isn't as useful as writing to another simulator program that
can modify the values returned to the CPU simulator based on what
the CPU wrote. Writing to another simulator program can be used to
address a subset of the problems that a hybrid simulator would
address. Some simulator debuggers also have powerful macro
languages that can be used to build simple models of parts of your
system.
What else should be on your feature shopping list? If you're a typical
embedded system developer (whatever that means), you'll probably
need to address most of the debugging problems described earlier in
this article. Here are some of the features you'll want to look at
when comparing simulator
alternatives.
Breakpoints
. You should be able to set breakpoints on reading or
writing a given memory location as well as on instruction fetches. If
your processor supports a cache model, setting a breakpoint should
check to see if that location is in cache. If so, it should set the
breakpoint in both simulated physical memory and in the cache.
Suppose you fetch an instruction from a location that has a
breakpoint set, and that fetch comes from cache. This action should
stop the simulator when
the instruction is fetched for execution, not
during the cache fill (which might be for some nearby location).
Interrupts. Because debugging ISRs is one of the most powerful
features of a simulator, make sure the simulator has the ability to
generate an interrupt when you want it. Many simulators can
generate repeated exceptions. For example, you could ask for an
interrupt to occur every 100,000 cycles. Of course, the simulator
must accurately model the processor's exception handling behavior.
Exception handling
. Aside from interrupts, there are other types of
exceptions that can occur at run time in your embedded system.
Simulators are a particularly good way to ensure that you handle
these conditions properly. You certainly want to know if you try to
execute an invalid opcode. You can experience arithmetic overflow in
an integer calculation, and you may want to have a handler for such
exceptions. There are many more types of floating-point exceptions
that can occur--the IEEE 754
standard specifies invalid operation,
divide by zero, overflow, underflow, and inexact operation
exceptions. If your chosen CPU includes a floating-point
coprocessor, you need to simulate not only the arithmetic operations
but also the exceptions that can arise. A "native" floating-point
solution (one that uses the host computer's floating-point
capabilities) may not handle exceptions in the same way as your
target system. Worse, a native floating-point solution could just
dump you out of the
simulator if a divide-by-zero exception occurs.
A better solution is to have the simulator itself implement the
floating-point. This procedure is slower, but it is a better
representation of the processor's behavior, and guarantees that
exceptions can be detected by the simulator, and your exception
handler (not the host computer's) will be invoked.
Memory usage. Tag bits allow the simulator to analyze which
memory locations were used and how. This facility allows you to do
code coverage (branch flow
analysis also works here) for checking
out your QA test suites, as well as analyze your stack usage for
more efficient allocation of memory.
Cache simulation
. If your processor has on-chip cache, your
simulator needs to provide a good cache model. Off-chip secondary
caches vary in design too much for simulators to model them. Unless
you are planning to run with caches disabled, using a simulator that
doesn't model cache isn't going to give you the information you need.
Some processors
support variable cache line sizes; make sure your
simulator does too if you're using one of these CPUs. Make sure the
simulator gives you feedback on cache hits and misses so you can
tune your memory layout for optimum performance.
MMU simulation
. Most embedded designs don't use memory
management for much more than protection. However, if there's an
MMU on the chip, there should be one in the simulator as well. The
importance of this depends on which CPU you are using. Some CPUs
always run
through at least parts of the memory management
system. The first level of "decoding" tells whether the virtual
address is (or can be simply transformed into) the physical address.
Memory regions can be mapped or unmapped and can also be cached or
uncached. Even if you're not using the translation lookaside buffer,
you'll probably need to write some startup code that sets up the CPU
for future accesses. It's helpful to have tested this code on the
simulator before the hardware is ready.
Trace
. Because debugging often involves finding the path through
your code to the point of failure, it's useful to have a trace of
instructions leading up to a breakpoint. Logic analyzers and in-
circuit emulators have always provided such facilities. Simulators
can do better by annotating the trace with the contents of any
registers used in the instructions.
Host access
. Because CPU simulators that communicate with in-
house, project-specific simulators aren't commonly available yet,
make sure you have the next best thing. Your simulator should be
able to open, close, read, and write files on the development host
computer. This feature can be provided by using a special run time
library. Calls to the standard low-level file functions trigger the
simulator to map these function calls onto the host system. One way
to achieve this function is to trap the system call instruction on
processors that have such an instruction. The simulator then
examines the arguments (placed there by the
special run time
library) and calls on the host system to fulfill the request. An
alternative would be to map a simulated memory location to a file.
Subsequent reads and writes of the memory location cause reads and
writes of the file.
User interface
. The primary criterion in choosing a user interface is
to select one with which you can be comfortable. If you are using
simulation as one tool in a more extensive set of debugging
execution environments (emulation, monitors, and so on),
you will be
much more productive if you have the same user interface on all your
tools. This way, you can move back and forth between tools without
having to mentally "change gears." If your in-circuit emulator
vendor also offers a simulator, it will probably share the same user
interface as the software that controls the emulator itself.
Finally, the question of performance needs to be addressed. At the
beginning of this article, we pointed out that people don't like
simulators because they
are slow. There's no getting around this
problem, other than to keep buying faster host computers. However,
the design of a simulator has a lot to do with providing reasonable
performance. There are no hard and fast rules about how fast a
simulator ought to run, because hosts and simulated CPUs vary so
widely. Just keep in mind that some problems will require you to run
a lot of code just to get to the point where you want to begin single
stepping and looking at register contents. You shouldn't
expect one
million instructions per second out of your simulator, but 100,000
is certainly a reasonable number to expect on a Pentium-based PC or
a mid-range Unix workstation. Before buying a simulator, be sure to
ask the vendor how fast it is.
A FINAL WORD
While a simulator alone is probably insufficient for debugging every
embedded system problem you will encounter, it makes sense to
have one around as part of your software toolbox. It should be
considered complementary (rather than as
an alternative) to in-
circuit emulators and debug monitors. A simulator can be used
effectively in the early stages of software bringup, reducing the
length of time spent on system integration. Because simulators are
less expensive than instrumentation, you can provide simulator
access to all members of your software team at a relatively low
cost. Then, you can save the emulator for the tough hardware-
intensive problems. Simulators also provide a real advantage if
hardware prototypes are in short
supply, as they nearly always are.
There are also a number of problems that are best left to simulation
environments. These problems generally involve collecting data
about the run time behavior of your embedded application. This area
is one where simulators excel, because their data collection
capabilities are almost unlimited. Simulators are also great for
creating test conditions that can't be easily duplicated on real
hardware. In this way, you can check out how your system will
respond to
boundary conditions once it is in the field. If you haven't
worked with simulators before, you should consider how they can
help you get your product to market faster and with fewer defects.
If you decide to try simulation, the guidelines on what features to
look for can help you choose the right tool for the job.
Kenneth F. Greenberg is president of California Advanced Software
Tools, Inc. He can be reached via e-mail at ken@rahul.net.