The basics of programming embedded processors: Part 4Another major code generation problem is the creation of procedures. Generating code for procedures is relatively straightforward once we know the procedure linkage appropriate for the CPU. At the procedure definition, we generate the code to handle the procedure call and return. At each call of the procedure, we set up the procedure parameters and make the call.
The CPU's subroutine call
mechanism is usually not
sufficient to directly support procedures in modern programming
languages. The procedure linkage mechanism provides a way for the
pass parameters into the program and for the procedure to return a
It also provides help in restoring the values of registers that the procedure has modified. All procedures in a given programming language use the same linkage mechanism (although different languages may use different linkages). The mechanism can also be used to call handwritten assembly language routines from compiled code.
Procedure stacks are typically built to grow down from
high addresses. A stack pointer (sp) defines the end
of the current frame, while a frame pointer (fp)
defines the end of the last frame. (The fp is technically necessary
only if the stack frame can be grown by the procedure during
The procedure can refer to an element in the frame by addressing relative to sp. When a new procedure is called, the sp and fp are modified to push another frame onto the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. Although the stack frames are in main memory, understanding how registers are used is key to understanding the mechanism, as explained below.
- r0"r3 are used to pass parameters into the procedure. r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame.
- r4"r7 hold register variables.
- r11 is the frame pointer and r13 is the stack pointer.
- r10 holds the limiting address on stack size, which is used to check for stack overflows.
Other registers have additional uses in the protocol.
|Figure 5-14. Layout of a one-dimensional array in memory|
The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. Some of these computations can be done at compile time while others must be done at run time. arrays Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. Let us first consider one-dimensional arrays:
The layout of the array in memory is shown in Figure 5-14 above. The zeroth element is stored as the first element of the array, the first element directly below, and so on. We can create a pointer for the array that points to the array's head, namely, a. If we call that pointer aptr for convenience, then we can rewrite the reading of a [i] as
*(aptr + i)
Two-dimensional arrays are more challenging. There are
multiple possible ways to lay out a two-dimensional array in memory, as
shown in Figure 5-15 below. In
this form, which is known as row major,
the inner variable of the array (j in a[i,j]) varies most quickly.
(Fortran uses a different organization known as column major.)
Two-dimensional arrays also require more sophisticated addressing - in particular, we must know the size of the array. Let us consider the row-major form. If the a array is of size N × M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus,
a[i*M + j]
where the maximum value for j is M ' 1.
|Figure 5-15. Memory layout for two-dimensional arrays|
A C struct is easier to address. As shown in Figure 5-16 below, a structure is implemented as a contiguous block of memory. Fields in the structure can be accessed using constant offsets to the base address of the structure. In this example, if field1 is four bytes long, then field2 can be accessed as
*(aptr + 4)
This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.
Expression simplification is a useful area for machine-independent transformations. We can use the laws of algebra to simplify expressions. Consider the following expression:
a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
|Figure 5-16. C structure layout and access.|
Since the new expression has only two operations rather
than three for the original form, it is almost certainly cheaper,
because it is both faster and smaller. Such transformations make some
broad assumptions about the relative cost of operations.
In some cases, simple generalizations about the cost of operations may be misleading. For example, a CPU with a multiply-andaccumulate instruction may be able to do a multiply and addition as cheaply as it can do an addition. However, such situations can often be taken care of in code generation.
We can also use the laws of arithmetic to further simplify expressions on constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time - there is no need to perform that arithmetic while the program is executing. Why would a program ever contain expressions that evaluate to constants? Using named constants rather than numbers is good programming practice and often leads to constant expression. The original form of the for statement could have been
for (i = 0; i < NOPS + 1; i++)
where, for example, the added 1 takes care of a trailing null character.