Using your C-compiler to minimize code size.
Function Calls. As assembly programmers well know,
calling a function written in a high-level language can be rather
complicated and costly. The calling function must save global variables
back to memory, make sure to move local variables to the registers that
survive the call (or save to the stack), and parameters may have to be
pushed on the stack.
Inside the called function, registers will have to be saved,
parameters taken off the stack, and space allocated on the stack for
local variables. For large functions with many parameters and
variables, the effort required for a call can be quite large.
Modern compilers do their best, however, to reduce the cost of a
function call, especially the use of stack space. A number of registers
will be designated for parameters, so that short parameter lists will
most likely be passed entirely in registers. Likewise, the return value
will be put in a register, and local variables will only be put on the
stack if they cannot be allocated to registers.
The number of register parameters will vary wildly between different
compilers and architecture. In most cases, at least four registers are
made available for parameters. Note also that just like for register
allocation, only small parameter types will be passed in registers.
Arrays are always passed as pointers to the array (C semantics dictate that), and
structures are usually copied to the stack and the structure parameter
changed to a pointer to a structure. That pointer might be passed in a
register, however. To save stack space, it is thus a good idea to
always use pointers to structures as parameters and not the structures
themselves.
C supports functions with variable numbers of arguments. This is
used in standard library functions like printf() and scanf() to provide
a convenient interface. However, the implementation of variable numbers
of arguments to a function incurs significant overhead.
All arguments have to be put on the stack, since the function must
be able to step through the parameter list using pointers to arguments,
and the code accessing the arguments is much less efficient than for
fixed parameter lists. There is no type-checking on the arguments,
which increases the risk of bugs. Variable numbers of arguments should
not be used in embedded systems!
Function Inlining. It is good programming practice to
break out common pieces of computation and accesses to shared data
structures into (small) functions. This, however, brings with it the
cost of calling a function each time something should be done. In order
to mitigate this cost, the compiler transformation of function inlining
has been developed. Inlining a function means that a copy of the code
for the function is placed in the calling function, and the call is
removed.
Inlining is a very efficient method to speed up the code, since the
function call overhead is avoided but the same computations carried
out. Many programmers do this manually by using preprocessor macros for
common pieces of code instead of functions, but macros lack the type
checking of functions and produce harder-to-find bugs.
The executable code will often grow as a result of inlining, since
code is being copied into several places. Inlining may also help shrink
the code: for small functions, the code size cost of a function call
might be bigger than the code for the function. In this case, inlining
a function will actually save code size (as well as speed up the
program).
The main problem when inlining for size is to estimate the gains in
code size (when optimizing for speed, the gain is almost guaranteed).
Since inlining in general increases the code size, the inliner has to
be quite conservative. The effect of inlining on code size cannot be
exactly determined, since the code of the calling function is
disturbed, with nonlinear effects.
To reduce the code size, the ideal would be to inline all calls to a
function, which allows us to remove the function from the program
altogether. This is only possible if all calls are known, i.e., are
placed in the same source file as the function, and the function is
marked static, so that it cannot be seen from other files.
Otherwise, the function will have to be kept (even though it might
still be inlined at some calls), and we rely on the linker to remove it
if it is not called. Since this decreases the likely gain from
inlining, we are less likely to inline such a function.