Using your C-compiler to minimize code size.
A C compiler is a basic tool for most embedded systems programmers. It
is the tool by which the ideas and algorithms in your application
(expressed as C source code) are transformed into machine code
executable by your target processor.
To a large extent, the C compiler determines how large the
executable code for the application will be. A compiler performs many
transformations on a program in order to generate the best possible
code.
Examples of such transformations are storing values in registers
instead of memory, removing code that does nothing useful, reordering
computations in a more efficient order, and replacing arithmetic
operations by cheaper operations.
The C language provides the compiler with significant freedom
regarding how to precisely implement each C operation on the target
system. This freedom is an important factor in why C can usually be
compiled very efficiently, but a programmer needs to be aware of the
compiler's freedom in order to write robust code.
To most programmers of embedded systems, the case that a program
does not quite fit into the available memory is a familiar
phenomenon. Recoding parts of an application in assembly language or
throwing out functionality may seem to be the only alternatives, while
the solution could be as simple as rewriting the C code in a more
compiler-friendly manner.
In order to write code that is compiler friendly, you need to have a
working understanding of compilers. Some simple changes to a program,
like changing the data type of a frequently-accessed variable, can have
a big impact on code size while other changes have no effect at all.
Having an idea of what a compiler can and cannot do makes such
optimization work much easier.
Assembly programs specify both what, how, and the precise order in
which calculations should be carried out. A C program, on the other
hand, only specifies the calculations that should be performed. With
some restrictions, the order and the technique used to realize the
calculations are up to the compiler.
The compiler will look at the code and try to understand what is
being calculated. It will then generate the best possible code, given
the information it has managed to obtain, locally within a single
statement and also across entire functions and sometimes even whole
programs.
The Structure of a Compiler
In general, a program is processed in six main steps in a modern
compiler (not all compilers follow this blueprint completely, but as a
conceptual guide it is sufficient):
Parser. The conversion from C source code to an
intermediate language.
High-level optimization. Optimizations on the
intermediate code.
Code generation. Generation of target machine code from
the intermediate code.
Low-level optimization. Optimizations on the machine
code.
Assembly. Generation of an object file that can be linked
from the target machine code.
Linking. Linking of all the code for a program into an
executable or downloadable file.
The parser parses the C source code, checking the syntax and
generating error messages if syntactical errors are found in the
source. If no errors are found, the parser then generates intermediate
code (an internal representation of the parsed code), and compilation
proceeds with the first optimization pass.
The high-level optimizer transforms the code to make it better. The
optimizer has a large number of transformations available that can
improve the code, and will perform those that it deems relevant to the
program at hand. Note that we use the word "transformation" and not
"optimization."
"Optimization" is a bit of a misnomer. It conveys the intuition that
a change always improves a program and that we actually find optimal
solutions, while in fact optimal solutions are very expensive or even
impossible to find (undecidable, in computer science lingo). To ensure
reasonable compilation times and the termination of the compilation
process, the compiler has to use heuristic methods ("good guesses").
Transforming a program is a highly nonlinear activity, where
different orderings of transformations will yield different results,
and some transformations may actually make the code worse. Piling on
more "optimizations" will not necessarily yield better code.
When the high-level optimizer is done, the code generator transforms
the intermediate code to the target processor instruction set. This
stage is performed, piece-by-piece, on the intermediate code from the
optimizer, and the compiler will try to do smart things on the level of
a single expression or statement, but not across several statements.
The code generator will also have to account for any differences
between the C language and the target processor. For example, 32-bit
arithmetic will have to be broken down to 8-bit arithmetic for a
small-embedded target (like an Intel 8051, Motorola 68HC08, Samsung
SAM8, or Microchip PIC).
A very important part of code generation is allocating registers to
variables. The goal is to keep as many values as possible in registers,
since register-based operations are typically faster and smaller than
memory-based operations.
After the code generator is done, another phase of optimization
takes place, where transformations are performed on the target code.
The low-level optimizer will clean up after the code generator (which
sometimes makes suboptimal coding decisions), and perform more
transformations.
There are many transformations that can only be applied on the
target code, and some that are repeated from the high-level phase, but
on a lower level. For example, transformations like removing a "clear
carry" instruction if we already know that the carry flag is zero are
only possible at the target code level, since the flags are not visible
before code generation.
After the low-level optimizer is finished, the code is sent to an
assembler and output to an object file. All the object files of a
program are then linked to produce a final binary executable ROM image
(in some format appropriate for the target). The linker may also
perform some optimizations, for example, by discarding unused
functions.
Thus, one can see that the seemingly simple task of compiling a C
program is actually a rather long and winding road through a highly
complex system. Different transformations may interact, and a local
improvement may be worse for the whole program. For example, an
expression can typically be evaluated more efficiently if given more
temporary registers.
Taking a local view, it thus seems to be a good idea to provide as
many registers as necessary. A global effect, however, is that
variables in registers may have to be spilled to memory, which could be
more expensive than evaluating the expression with fewer registers.