Instruction Set Simulation in C -

Instruction Set Simulation in C

When developing software for small microcontrollers, it is common to use assembly language in the final product. But there is still much value in prototyping the software and its algortihms in C. Here's a helpful way to migrate from C to assembly smoothly, while still working in the prototype environment.

The focus of this article is a development technique specifically targeted at very small microcontrollers. Typical of this group are some members of the PIC family from Microchip. These devices are characterized by their very small memories (1-2K word of ROM plus a few dozen bytes of RAM), limited stack facilities, and limited instructions sets.

Application software for these microcontrollers is frequently written in assembly language. Of course, much of the software targeted at these systems is small and simple enough that developing in assembly is manageable. However, there are some applications with complex algorithms or control behavior, for which it is much easier and quicker to arrive at working code in a high-level language. I have recently worked on such a project.

Though decent cross-compilers are available, small microcontrollers do not always support high-level languages well. With such small ROMs and limited non-traditional call stacks, assembly language is often unavoidable. It is very difficult to implement high-level languages on a device without a large and flexible stack. Also, simple things C programmers take for granted (like adding two 32-bit integers) often require a call to a large or slow library routine, since support for those operations is not available in the instruction set. Finally, these devices often have other architectural quirks that can add to the hurdles facing a compiler. For instance, paged memories are common due to the limited number of address bits affordable within the opcodes.

Having said that, nothing prevents us from developing prototype code in a C on a host workstation. Such a prototype offers many advantages:

  • Implementing complex algorithms is much easier in C than assembly.
  • Restructuring the program as the design evolves is also easier and less likely to break the existing design.
  • It is easy to drive algorithm testing by writing test harness code to exercise a prototype than the embedded target.
  • Viewing the internal state of the algorithm in a convenient form is possible using a source-level debugger.

But a prototype is just that. Once the design of the prototype is finalized, the entire code must be rewritten in assembly. The reimplementation is typically done by hand and is a big risk. What if the prototype code works just fine, but the assembly reimplementation running on the target does not. What went wrong? Where do you start to make the reimplementation work?

Another concern is maintenance. Suppose that, once the assembly code is working properly, the design of the algorithms must be changed. Before you can go back to the C prototype and begin making changes and debugging them, you will first need to duplicate any and all changes made in the assembly code after the original reimplementation. Otherwise, the prototype and the assembly will be out of sync and the second reimplementation process will be even more complicated.

My last project based on a small microcontroller was sufficiently complex to prompt me to start with a C prototype. After I got the application working in C, I wasn't comfortable leaving my IDE and source-level debugger behind for good. Many of the more difficult parts of the implementation hadn't yet emerged. I really wanted to stay with C and keep my IDE and interactive test harness while I worked through those parts in assembly language.

Expanding on this thinking, I realized that what I needed was a way to gradually migrate the prototype piece by piece from C to assembly. If done properly, I'd be able to retest the program on the host at any stage in between. Each newly translated portion could be exercised in a known working framework. And the workstation's keyboard and display could be used to generate stimuli and present the results in a user-friendly format.

Instruction Set Simulation
In fact, I soon realized, there is a simple way to achieve this. For the small micros we've been considering, the solution is straightforward and quite easy to implement in a short space of time.

For the record, my project was based around a PIC 16F84. This microcontroller provides a 1K-word program store, 68 bytes of RAM, and 13 I/O lines. The instruction set consists of 35 fixed-size opcodes.

What I needed was a simulation of my microcontroller's instruction set. But this would be a simulation with a difference. Typical instruction set simulators are programs that load the development code in the form of a binary executable. There is no way to fuse that executable with a C test harness like mine.

An alternative approach is to simulate the instruction set as a collection of C functions and macros. Each opcode corresponds to one function or macro. For example, the PIC's movlw opcode, which loads an immediate value into 8-bit accumulator, W, could be simulated in C by a macro called MOVLW() that takes a literal argument and stores that in a global variable Wreg.

A sequence of assembly instructions becomes a list of calls to the corresponding functions or macros. These calls can coexist alongside any other C code-unconverted prototype statements, test harness code, debug support code, and so on. Of course, the C prototype should be realistic in its use of high-level language features. Features to favor are small integer variables (8 or 16-bit), bitfields, simple arrays (one dimensional, preferably 8-bit elements), and perhaps simple structs. Features to avoid are floating point, multidimensional arrays, complex structs, and, where possible, division and modulus.

The first step towards translating part of the prototype into assembly language is to convert the variables it uses into 8-bit quantities. For example, a 32-bit integer becomes four contiguous 8-bit locations. Some of the C code will need to be modified accordingly. The C code will become ugly. This is okay as long as it still works (retest it before going any further). Remember, the C code is gradually disappearing, so we can tolerate ugly C in the meantime. Having modified the variables, we can then translate lines of C into the equivalent assembly language instructions.

In a simple example, the following C code:

uint32 x;::x = 0;

might first become:

// 32-bit X uses registers 12-15#define X 12	uint8 regfile[128];::*(uint32 *)®file[X] = 0;

which then becomes:

#define X 12uint8 regfile[128];::MOVLW( 0 );MOVWF( X );MOVWF( X+1 );MOVWF( X+2 );MOVWF( X+3 );

where MOVLW() and MOVWF() are C functions that simulate the PIC instructions of the same names.

Early in the development, when substantial blocks of C code remain, it may be best to keep some of the original C variables active in parallel with their assembly memory locations. This can be achieved by having two utility functions, assign_files and assign_var, for copying values between the two representations. These functions should be called at the boundaries of the simulated assembly language portions. For example:

#define X 12#define TEMP 16

uint32 x;uint8 regfile[128];

x = 0x12345678;y = x & mask;

// Switch to assemblyassign_files( X, &x );


// Switch back to Cassign_var( &x, X );

if (x > LIMIT) ...

Using these guidelines we can translate from C to assembly language in step sizes as large or small as we are comfortable with. At each stage, we have the option of using a source-level debugger, an interactive test harness, or an automated test driver.

Eventually all of the code will have been translated into (simulated) assembly language. Even at this stage, all of the productivity benefits of working in the IDE are still available.

Final Translation
Having converted the entire program into (simulated) assembly language and tested it to your satisfaction, the next step is to reformat the source into genuine assembly source for your target:

  • The simulated assembly statements are actually function and macro calls. Operands are coded as arguments, and thus are enclosed in parentheses. These parentheses must be removed.
  • Subroutines are currently C functions. The curly braces surrounding each must be removed. We must also ensure there is an explicit return opcode at the end of each subroutine, since C provided one for us implicitly at the closing brace.
  • Comment delimitors must be changed from C style to assembly style. If your C compiler accepts C++ style comments, use them; those are easier to convert.
  • Variable declarations must be converted from C style to assembly directive style.

Much of this conversion can be automated by judicious use of the editor's find-and-replace facility. I suggest making a copy of the prototype source file and reformatting that. It is worth keeping the original for future development. The workload can be reduced by planning this activity and selecting the replacement order carefully.

Alternatively, this conversion can be largely automated by using Awk or Perl scripts. I have developed an Awk script that will perform the basic conversions mentioned above. This script is available, along with my PIC 16F84 instruction set simulation, at

Of course, additional assembler directives may be required to select options and to map the software into memory as required. Some reordering of subroutines may also be necessary to accommodate the microcontroller's paged memory layout.

If any changes are made to code that originated in the prototype, serious consideration should be given to mirroring these changes in the prototype simultaneously. If it is kept up to date with the target code, the simulation can remain useful throughout the project's development and maintenance lifetime. New application features and bug fixes can be tackled first in the prototype, then the prototype retranslated.

Costs and Benefits
As I hinted earlier, this simulation technique is best suited to applications based on microcontrollers with the following characteristics:

  • Small simple instruction sets with very few addressing modes. A single programmer can quickly and easily create a C simulation of such an instruction set.
  • Small memory spaces. Simulating one or two kilobytes of software in this way is not likely to overtax any reasonable host machine.
  • Slow clock speeds. A host-to-target speed ratio of at least 40:1 seems reasonable. The importance of this is simulation speed. Assuming the host executes 20 to 50 instructions to simulate each target instruction (including function call overhead), then the simulation speed will be about the same as the target.

In my experiences simulating the PIC, I found that it is best to model the register file (aka, RAM) quite closely. This facilitates an easier transition from simulated assembly to actual assembly. Including in this model some of the microcontroller's own registers (like PCL and STATUS) at their correct locations makes final translation even easier.

The PIC's conditional branching consists of using the condition flags to decide whether or not the following instruction should be executed or skipped. To facilitate this, each of the simulation functions and macros must include a test to see if the instruction it is simulating should be skipped. If so, it must return without doing anything.

PIC-computed jumps can be accomplished by adding a value to the PCL register (the low 8-bits of the program counter). To simulate this I found the easiest solution was to extend the skip mechanism to allow multiple instructions to be skipped. In the simulation, the PCL register is normally kept set to zero. When a skip condition becomes true, though, PCL is set to one. This causes the following instruction to “skip” (and PCL to be decremented). If an instruction adds a value N to PCL, then it will be set to that value (since it is normally zero). The effect of this is to cause the next N instructions to be skipped. This achieves the same effect as the intended computed jump.

I have implemented the PIC goto instruction (unconditional jump) using C's goto statement. This provides a close syntactic match to the equivalent assembly language, and is very easy to implement. The limitation of this approach is that the jump destination cannot be outside the subroutine where the goto appears. In practice, this has not proven to be a problem for me. In fact, the assembly program's structure will benefit from adhering to this restriction. The actual goto is embedded in a simulation macro, since, like other instructions, goto might be skipped. The macro checks the skip condition and decides whether to jump or continue.

The PIC microcontrollers have a paged instruction address space. This has implications for instructions like call, goto, and addwf PCL,F, which modify the program counter. My simulation has no awareness of paged memory restrictions. I couldn't think of a way to model this in the simulation. These issues will have to be addressed using the actual PIC assembler and the real target source.

I have named all of the simulation functions and macros with the name of the corresponding PIC instruction, in upper case. Using upper case avoids problems with those instructions, like goto and return, whose names are also C keywords.

Some PIC instructions don't update status flags, some update all status flags, and some update only a subset of the flags. The appropriate support function will be called from each simulation function or macro to update the relevant flags. For example, the following simulation of the andwf instruction makes use of a support function to set the Z (zero) flag.

void ANDWF( file f, boolean fr ){SKIPTEST;Wtemp = Wreg & regfile[f];update_Zflag( Wtemp );update_dest( f, fr );}

Many PIC instructions can write their result either to the W register, or to a register file (RAM). The destination is selected by a boolean operand called d. In the simulation function above, this operand is represented by the fr argument. Since destination selection is such a common instruction feature, I have provided a support function, update_dest, to handle it.

All of my PIC instruction simulation macros and functions and the support functions for the 16F84 microcontroller are available in the available for download from The Awk script for automatically converting a PIC simulation this to actual PIC assembly is also there.

Having tried it once, I'll be developing this way every time now. I suspect you'll do the same.

Robert Gordon is currently a technical advisor to Nortel Networks. He has 15 years of experience in software development, mostly embedded, and holds a BSc in computer science from Queens University.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.