The fundamentals of software performance analysis: Part 2 - Path timing
Editor’s Note: In the second of a two part series from High Performance Embedded Computing, author Wayne Wolf discusses the best means by which to measure worst-case, best-case, and average-case execution time in evaluating software performance. Part 1: Path analysis
Several techniques are used to analyze path timing at different levels of abstraction. Abstract interpretation analyzes the behavior of the program in detail to more accurately understand the execution states of the program. Data flow analysis provides a more detailed view of how the program behaves. The most concrete technique is simulation, which is used to determine the details of how the processor responds to the program.
Bounding the number of iterations taken by a loop is particularly important for WCET analysis given that much of the execution time of most programs is spent in loops. Some loops, such as the FIR filter, are easy to analyze.
for (i = 0; i < N; i+)
y [i ] = c [i ] * x [i ];
But loops with conditionals create problems.
for (i = 0; i < N; i++)
for ( j = 0; j < M; j ++)
if (i ! = j )
c[ i ][j ] = a[i ]*b[ i ][j ];
c[i ][j ] = 0;
The algorithm of Healy et al [Hea99a] bound loop iterations in four phases:
- It first uses an iterative algorithm to identify branches that affect the number of iterations.
- It then identifies the loop iteration on which a loop-dependent branch changes direction.
- It determines when the branches found in step 1 are reached.
- Finally, it uses these results to calculate the iteration bound.
Figure 3-30 shows an example of loop iteration bounding from Healy et al [Hea99a]. The loop can be exited by a complex condition.
In the first step, it identifies the four control flow boxes with jumps as the points at which loop-dependent branches can change direction. It then determines that the first conditional branch, j > 75, occurs on iteration 26 (since j is incremented by 3 on each iteration). The condition j > 300 is taken on iteration 101, as is the condition j < 100. This gives a bound on the number of iterations as a minimum of 26 and a maximum of 101.
Vivancos et al. [Viv01] proposed a parametric timing analysis method to determine the timing behavior of programs with indefinite loops. They iteratively test the worst-case execution time of the program, starting with zero iterations and increasing the number of iterations by one at each step. Their testing terminates when the WCET stabilizes. Figure 3-31 shows a simple conditional, its associated CDFG, and the flow variables assigned to the edges in the CDFG.
Ermedahl et al. [Erm05] use a clustering technique to determine which parts of a program must be handled as a unit. They annotate the control flow graph of the program with flow facts, which include the defining scope, a context specifier, and a constraint expression. The constraints may involve execution count variables and constants. As illustrated in Figure 3-32, they cluster facts together to find the maximum scopes over which flow facts apply, such as in nested loops.
Click on image to enlarge.
Worst-case categories include the following:
- Always miss—Not guaranteed to be in the cache when referenced.
- Always hit—Guaranteed to be in the cache when it is referenced.
- First miss—Not guaranteed to be in the cache on the first loop iteration but guaranteed to be in the cache on subsequent iterations.
- First hit—Guaranteed to be in the cache on the first iteration but not guaranteed on subsequent iterations.
Best-case categories include the following.
- Always miss—Guaranteed not to be in the cache when first referenced.
- Always hit—May be in the cache every time it is referenced.
- First miss—Guaranteed not to be in the cache on the first loop iteration but may be on later iterations.
- First hit—Instruction may be in the cache on the first loop iteration but is guaranteed not to be in the cache for later iterations.
To analyze the behavior of the instructions in the pipeline, Healy et al. used a table to describe, for each type of instruction, worst-case and best-case number of cycles required at every stage of the pipeline.
They analyzed the iterations in a loop iteration or single invocation of a function, which could consist of multiple basic blocks. Their algorithm for determining the worst-case timing along a path is shown in Figure 3-33.