CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

The basics of programming embedded processors: Part 4
The creation of procedures



Embedded.com

Dead Code Elimination
Code that will never be executed can be safely removed from the program. The general problem of identifying code that will never be executed is difficult, but there are some important special cases where it can be done. Programmers will intentionally introduce dead code in certain situations.

Consider this C code fragment:

#define DEBUG 0 ... if (DEBUG) print_debug_stuff();

In the above case, the print_debug_stuff() function is never executed, but the code allows the programmer to override the preprocessor variable definition (perhaps with a compile-time flag) to enable the debugging code.

This case is easy to analyze because the condition is the constant 0, which C uses for the false condition. Since there is no else clause in the if statement, the compiler can totally eliminate the if statement, rewriting the CDFG to provide a direct edge between the statements before and after the if.

Some dead code may be introduced by the compiler. For example, certain optimizations introduce copy statements that copy one variable to another. If uses of the first variable can be replaced by references to the second one, then the copy statement becomes dead code that can be eliminated.

Procedure Inlining
Another machine-independent transformation that requires a little more evaluation is procedure inlining. An inlined procedure does not have a separate procedure body and procedure linkage; rather, the body of the procedure is substituted in place for the procedure call. Figure 5-17 below shows an example of function inlining in C.

The C++ programming language provides an inline construct that tells the compiler to generate inline code for a function. In this case, an inlined procedure is generated in expanded form whenever possible. However, inlining is not always the best thing to do. It does eliminate the procedure linkage instructions. However, in a cached system, having multiple copies of the function body may actually slow down the fetches of these instructions. Inlining also increases code size, and memory may be precious.

Function definition:
int foo(a,b,c) { return a + b " c; }

Function call:
z = foo(w,x,y,);

Inlining result:
z = w + x " y;

Figure 5-17. Function inlining in C. 

Loop Transformations
Loops are important program structures - although they are compactly described in the source code, they often use a large fraction of the computation time. Many techniques have been designed to optimize loops.

A simple but useful transformation is known as loop unrolling, which is illustrated in Example 5-4 below. Loop unrolling is important because it helps expose parallelism that can be used by later stages of the compiler.

Example 5-4 Loop Unrolling
A simple loop in C follows:

 for (i = 0; i < N; i++) {
a[i]=b[i]*c[i];
}

This loop is executed a fixed number of times, namely, N. A straightforward implementation of the loop would create and initialize the loop variable i, update its value on every iteration, and test it to see whether to exit the loop. However, since the loop is executed a fixed number of times, we can generate more direct code.

If we let N = 4, then we can substitute the above C code for the following loop:

 a[0] = b[0]*c[0]; 
a[1] = b[1]*c[1];
a[2] = b[2]*c[2];
a[3] = b[3]*c[3];

This unrolled code has no loop overhead code at all, that is, no iteration variable and no tests. But the unrolled loop has the same problems as the inlined procedure—it may interfere with the cache and expands the amount of code required.

We do not, of course, have to fully unroll loops. Rather than unroll the above loop four times, we could unroll it twice. The following code results:

 for (i = 0; i < 2; i++) { 
a[i*2] = b[i*2]*c[i*2];
a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1];
}

In this case, since all operations in the two lines of the loop body are independent, later stages of the compiler may be able to generate code that allows them to be executed efficiently on the CPU's pipeline.

Loop fusion combines two or more loops into a single loop. For this transformation to be legal, two conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop bodies must not have dependencies that would be violated if they are executed together - for example, if the second loop's ith iteration depends on the results of the i + 1th iteration of the first loop, the two loops cannot be combined. Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple loops.

Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the operations on a subset of the data. An example is shown in Figure 5-18 below. Here, each loop is broken up into tiles of size two. Each loop is split into two loops - for example, the inner ii loop iterates within the tile and the outer i loop iterates across the tiles.

The result is that the pattern of accesses across the a array is drastically different - rather than walking across one row in its entirety, the code walks through rows and columns following the tile structure. Loop tiling changes the order in which array elements are accessed, thereby allowing us to better control the behavior of the cache during loop execution.

Figure 5-18. Loop tiling.

We can also modify the arrays being indexed in loops. Array padding adds dummy data elements to a loop in order to change the layout of the array in the cache. Although these array locations will not be used, they do change how the useful array elements fall into cache lines. Judicious padding can in some cases significantly reduce the number of cache conflicts during loop execution.

Next in Part 5: Register allocation and scheduling
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
To read Part 3, go to  "Basic compilation techniques"

Used with the permission of the publisher, Newnes/Elsevier, this series of articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.

Wayne Wolf  is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.

1 | 2

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :