Using your C-compiler to minimize code size.
The Meaning of a Program
Before the compiler can apply transformations to a program, it must
analyze the code to determine which transformations are legal and
likely to result in improvements. The legality of transformations is
determined by the semantics laid down by the C language standard.
The most basic interpretation of a C program is that only statements
that have side effects or compute values used for performing side
effects are relevant to the meaning of a program.
Side effects are any statements that change the global state of the
program. Examples that are generally considered to be side effects are
writing to a screen, changing a global variable, reading a volatile
variable, and calling unknown functions.
The calculations between the side effects are carried out according
to the principle of "do what I mean, not what I say." The compiler will
try to rewrite each expression into the most efficient form possible,
but a rewrite is only possible if the result of the rewritten code is
the same as the original expression. The C standard defines what is
considered "the same," and sets the limits of allowable optimizations.
Basic Transformations. A modern compiler performs a
large number of basic transformations that act locally, like folding
constant expressions, replacing expensive operations by cheaper ones
("strength reduction"), removing redundant calculations, and moving
invariant calculations outside of loops. The compiler can do most
mechanical improvements just as well as a human programmer, but without
tiring or making mistakes.
The Table below shows (in C form for readability) some
typical basic transformations performed by a modern C compiler. Note
that an important implication of this basic cleanup is that you can
write code in a readable way and let the compiler calculate constant
expressions and worry about using the most efficient operations.
All code that is not considered useful—according to the definition
in the previous section—is removed. This removal of unreachable or
useless computations can cause some unexpected effects. An important
example is that empty loops are completely discarded, making "empty
delay loops" useless. The code shown below stopped working
properly when upgrading to a modern compiler that removed useless
computations:
Register Allocation.Processors usually give better
performance and require smaller code when calculations are performed
using registers instead of memory. Therefore, the compiler will try to
assign the variables in a function to registers.
A local variable or parameter will not need any RAM allocated at all
if the variable can be kept in registers for the duration of the
function. If there are more variables than registers available, the
compiler needs to decide which of the variables to keep in registers,
and which to put in memory.
This is the problem of register allocation, and it cannot be
solved optimally. Instead, heuristic techniques are used. The
algorithms used can be quite sensitive, and even small changes to a
function may considerably alter the register allocation.
Note that a variable only needs a register when it is being used. If
a variable is used only in a small part of a function, it will be
register allocated in that part, but it will not exist in the rest of
the function. This explains why a debugger sometimes tells you that a
variable is "optimized away at this point."
The register allocator is limited by the language rules of C—for
example, global variables have to be written back to memory when
calling other functions, since they can be accessed by the called
function, and all changes to global variables must be visible to all
functions. Between function calls, global variables can be kept in
registers.
Note that there are times when you do not want variables to be
register allocated. For example, reading an I/O port or spinning on a
lock, you want each read in the source code to be made from memory,
since the variable can be changed outside the control of your program.
This is where the volatile keyword is to be used. It signals to the
compiler that the variable should not ever be allocated in registers,
but read from memory (or written) each time it is accessed.
In general, only simple values like integers, floats, and pointers
are considered for register allocation. Arrays have to reside in memory
since they are designed to be accessed through pointers, and structures
are usually too large. In addition, on small processors, large values
like 32-bit integers and floats may be hard to allocate to registers,
and maybe only 16-bit and 8-bit variables will be given registers.
To help register allocation, you should strive to keep the number of
simultaneously live variables low. Also, try to use the smallest
possible data types for your variables, as this will reduce the number
of required registers on 8-bit and 16-bit processors.