By Nigel Paver, Bradley Aldrich and Moinul Khan, Intel Corp.
Optimizing Condition Checks
Core instructions can selectively modify the state of the condition
codes. When generating code for if...else and loop conditions, it is
often beneficial to make use of this feature to set condition codes,
thereby eliminating the need for a subsequent compare instruction.
Consider the following C statement.
if ((a
+ b) !=0)
c = c + 1;
Code generated for the if condition without using an add instruction
to set condition codes is:
add
r2, r0, r1
cmp
r2, #0
addne r3, r3,
#1
However, code can be optimized making use of an add instruction to
set condition codes:
adds
r2, r0, r1
addne
r3, r3, #1
Condition checking for coprocessor registers can also be performed.
SIMD flags in the wCASF register are updated during execution of
Wireless MMX instructions. Then, using one of the three flag extraction
operations - TANDC, TORC, or TEXTRC - flags for the XScale core can be
updated.
This method allows checking of all or one of the SIMD fields for
conditional execution. Called group conditional execution, this method
is shown in the following example:
wsubhus
wR1, wR2, wR3
@ Saturating subtraction minimum of
wR1
@ is zero
torch R15
@ Updating core flags with ORed
@ coprocessor flag values
addeq
r2, r2, #1
@ now executes conditional coprocessor
flag
All preceding techniques of effectively using conditional execution
can also be applied to the group conditional execution. For cases such
as peak detection or finding a match in a vector, you can use group
conditional techniques.
The instructions that increment or decrement the loop counter can
also be used to modify the condition codes. Modifying the codes
eliminates the need for a subsequent compare instruction. A conditional
branch instruction can then be used to exit or continue with the next
loop iteration.
Consider the following C code segment:
for (i
= 10; i != 0; i--) {
perform
inner_kernel;
}
The optimized code generated for the preceding code segment would
look like:
L6:
@equivalent to inner_kernel
subs
r3, r3, #1
bne
.L6
Using the above argument, it is also beneficial to rewrite loops
whenever possible to make the loop exit conditions check against the
value 0. For example, the code generated for the following code segment
needs a compare instruction to check for the loop exit condition.
for (i
= 0; i < 10; i++) {
perform
inner_kernel;
}
If the loop is rewritten as follows, the code generated avoids using
a compare instruction to check for the loop exit condition.
for (i
= 9; i >= 0; i--) {
perform
inner_kernel;
}
However, the use of conditional instructions should be considered
carefully to ensure it improves performance. To decide when to use
conditional instructions over branches, consider this hypothetical code
segment:
< style="font-style: italic;">
if
(cond)
if_stmt
else
else_stmt
>
Using the following data:
N1Beta = number of cycles
to execute the if_stmt, assuming the use of branch instructions
N2Beta = number of cycles to
execute the else_stmt, assuming the use of branch instructions
P1 = percentage of times the
if_stmt is likely to be executed
P2 = percentage of times likely
to incur a branch misprediction penalty
N1c = number of cycles to
execute the if...else portion using conditional instructions assuming
the if condition to be true
N2c = number of cycles to
execute the if...else portion using conditional instructions assuming
the if condition to be false
Use conditional instructions when:
EQPage227
The following example illustrates a situation in which it is better
to use branches instead of conditional instructions.
cmp
r0, #0
bne L1
add r0, r0, #1
add r1, r1, #1
add r2, r2, #1
add r3, r3, #1
add r4, r4, #1
b
L2
L1:
sub r0, r0, #1
sub r1, r1, #1
sub r2, r2, #1
sub r3, r3, #1
sub r4, r4, #1
L2:
The CMP instruction takes one cycle to execute, the if statement
takes seven cycles to execute, and the else statement takes six cycles
to execute. If the code were changed to eliminate the branch
instructions by using conditional instructions, the if...else statement
would take 10 cycles to complete.
Assuming an equal probability of both paths being taken and that
branch mispredictions occur 50 percent of the time, the cost of using
conditional instructions is 11 cycles and the cost of branches is 9.5
cycles.