Two important aspects of the code compilation phase in embedded
program development are register allocation and schedule.
Given a block of code, we
want to choose assignments of
variables (both declared and temporary) to registers to minimize the
total number of required registers. Example 5-5 illustrates the
importance of proper register allocation.
Example 5-5 Register
Allocation
To keep the example small, we assume that we can use only four of the
ARM's registers. In fact, such a restriction is not
unthinkable - programming conventions can reserve certain registers for
special purposes and significantly reduce the number of general-purpose
registers available.
Consider the following C code:
w = a + b; /* statement 1 */
x = c + w; /* statement 2 */
y = c + d; /* statement 3 */
A naive register allocation, assigning each variable to
a separate register, would require seven registers for the seven
variables in the above code. However, we can do much better by reusing
a register once the value stored in the register is no longer needed.
To understand how to do this, we can draw a lifetime
graph that shows
the statements on which each statement is used. Appearing below is a
lifetime graph in which the x axis is the statement number in the C
code and the y axis shows the variables.

A horizontal line stretches from the first statement
where the variable is used to the last use of the variable; a variable
is said to be live during this interval. At each statement, we can
determine every variable currently in use. The maximum number of
variables in use at any statement determines the maximum number of
registers required.
In this case, statement two requires three
registers: c, w, and x. This fits within the four registers limitation.
By reusing registers once their current values are no longer needed, we
can write code that requires no more than four registers. Appearing
below is one register assignment.
a r0
b r1
c r2
d r0
w r3
x r0
y r3
The ARM assembly code that uses the above register
assignment follows:

If a section of code requires more registers than are
available, we must spill
some of the values out to
memory temporarily. After computing some values, we write the values to
temporary memory locations, reuse those registers in other
computations, and then reread the old values from the temporary
locations to resume work.
Spilling registers is problematic in several
respects. For example, it requires extra CPU time and uses up both
instruction and data memory. Putting effort into register allocation to
avoid unnecessary register spills is worth your time.
We can solve register allocation problems by building
a conflict graph and solving a graph coloring
problem. As shown in Figure 5-19
below, each variable in the high-level
language code is represented by a node. An edge is added between two
nodes if they are both live at the same time.
The graph coloring
problem is to use the smallest number of distinct colors to color all
the nodes such that no two nodes are directly connected by an edge of
the same color. The figure shows a satisfying coloring that uses three
colors. Graph coloring is NP-complete, but there are efficient
heuristic algorithms that can give good results on typical register
allocation problems.
 |
| Figure
5-19. Using graph coloring to solve the problem of Example 5-5. |
Lifetime analysis assumes that we have already
determined the order in which we will evaluate operations. In many
cases, we have freedom in the order in which we do things. Consider the
following expression:
(a + b) * (c - d)
We have to do the multiplication last, but we can do
either the addition or the subtraction first. Different orders of
loads, stores, and arithmetic operations may also result in different
execution times on pipelined machines.
If we can keep values in
registers without having to reread them from main memory, we can save
execution time and reduce code size as well. Example 5-6 illustrates
how proper operator scheduling can improve register allocation.
Example 5-6.Operator Scheduling
for Register Allocation
Here is sample C code fragment:
w = a + b; /* statement 1 */
x = c + d; /* statement 2 */
y = x + e; /* statement 3 */
z = a ' b; /* statement 4 */
If we compile the statements in the order in which
they were written, we get the register graph below.

Since w is needed until the last statement, we need
five registers at statement 3, even though only three registers are
needed for the statement at line 3. If we swap statements 3 and 4
(renumbering them)we reduce our requirements to three registers. The
modified C code follows:
w =
a + b; /* statement 1 */
w
= a - b; /* statement 2' */
w
= c + d; /* statement 3' */
w
= x + e; /* statement 4' */
The lifetime graph for the new code appears below.

Compare the ARM assembly code for the two code
fragments. We have written both assuming that we have only four free
registers. In the before version, we do not have to write out
any values, but we must read a and b twice. The after version
allows us to retain all values in registers as long as we need them.
Before version After version
LDR r0,a LDR r0,a
LDR r1,b LDR r1,b
ADD r2,r0,r1 ADD r2,r1,r0
STR r2,w ; w = a + b STR r2,w ; w = a + b
LDRr r0,c SUB r2,r0,r1
LDR r1,d STR r2,z ; z = a ' b
ADD r2,r0,r1 LDR r0,c
STR r2,x ; x = c + d LDR r1,d
LDR r1,e ADD r2,r1,r0
ADD r0,r1,r2 STR r2,x ; x = c + d
STR r0,y ; y = x + e LDR r1,e
LDR r0,a ; reload a ADD r0,r1,r2
LDR r1,b ; reload b STR r0,y ; y = x + e
SUB r2,r1,r0
STR r2,z ; z = a " b
Scheduling
We have some freedom to choose the order in which operations will be
performed. We can use this to our advantage—for example, we may be able
to improve the register allocation by changing the order in which
operations are performed, thereby changing the lifetimes of the
variables.
We can solve scheduling problems by keeping track of
resource utilization over time. We do not have to know the exact
microarchitecture of the CPU - all we have to know is that, for
example,
instruction types 1 and 2 both use resource A while instruction types 3
and 4 use resource B.
CPU manufacturers generally disclose enough
information about the microarchitecture to allow us to schedule
instructions even when they do not provide a detailed description of
the CPU's internals.
We can keep track of CPU resources during instruction
scheduling using a reservation table. As
illustrated in Figure 5-20 below,
rows in the table represent instruction
execution time slots and columns represent resources that must be
scheduled.
Before scheduling an instruction to be executed at a
particular time, we check the reservation table to determine whether
all resources needed by the instruction are available at that time.
Upon scheduling the instruction, we update the table to
note all
resources used by that instruction. Various algorithms can be used for
the scheduling itself, depending on the types of resources and
instructions involved, but the reservation table provides a good
summary of the state of an instruction scheduling problem in progress.
We can also schedule instructions to maximize
performance. When an instruction that takes more cycles than
normal to finish is in the pipeline, pipeline bubbles appear that
reduce performance. Software pipelining is a technique for reordering
instructions across several loop iterations to reduce pipeline bubbles.
Software pipelining is illustrated in Example 5-7.
 |
| Figure
5-20. A reservation table for instruction scheduling. |