CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Code techniques for processor pipeline optimization: Part 3
Optimization for Control-Oriented Operations



Embedded.com

Optimizing Complex Expressions Using Conditional Execution
Using conditional instructions helps improve the code generated for complex expressions such as the C shortcut evaluation feature.

The use of conditional instructions in this fashion helps improve performance by minimizing the number of branches, thereby minimizing the penalties caused by branch mispredictions.

int foo(int a, int b) {
    if (a != 0 && b != 0)
        return 0;
else
        return 1;
}

The optimized code for the if condition is:

    cmp         r0,#0
    cmpne     r1,#0

This approach also reduces the utilization of branch prediction resources. With Wireless MMX technology, the flag registers can be set based on data values in the coprocessor registers or SIMD flag registers.

Use Addressing Modes Efficiently
XScale and Wireless MMX provide a variety of addressing modes that make indexing an array of objects highly efficient. The following code samples illustrate how various kinds of array operations can be optimized to make use of these addressing modes:

@ Set the contents of the word pointed to
@ by r0 to the value contained in r1 and
@ make r0 point to the next word

        wstrw wR1,[r0], #4

@ Increment the contents of r0 to make it
@ point to the next word and set the
@ contents of the word pointed to the
@ value contained in r1

        wstrw wR1, [r0, #4]!

@ Set the contents of the word pointed to
@ by r0 to the value contained in r1 and
@ make r0 point to the previous word

        wstrw wR1,[r0], #-4

@ Decrement the contents of r0 to make it
@ point to the previous word and set the
@ contents of the word pointed to the value
@ contained in r1

        wstrw wR1,[r0, #-4]!

Various addressing modes save you from explicitly spending an instruction on updating the pointer.

Miscellaneous Approaches
Apart from the techniques mentioned earlier, you might consider these tricks geared towards interesting use of the instructions. Consider the following two cases.

Optimizing the Use of Immediate Values. For programming purposes, constant values may need to be used. Constant values are created to be used as masks or known coefficients in different calculations.

The MOV or MVN instruction should be used when loading an immediate, or constant, value into a register. However, immediate move is restricted to a 12-bit number. One could load the constant from memory. Loading 32-bit or 64-bit constant values requires loading from the memory.

The compiler typically places all the constants in a literal pool close to the instructions. Literal pools are not likely to be in the data cache, which makes loading constants expensive - a main memory access. Also, LDR instruction has the potential to pollute the data cache.

It is possible to generate a whole set of constant values using a combination of MOV, MVN, ORR, BIC, and ADD instructions. Use a combination of the above instructions to set a register to a constant value. An example of this is shown in these code samples.

@Set the value of r0 to 127
        mov r0, #127
@Set the value of r0 to 0xfffffefb.
        mvn r0, #260
@Set the value of r0 to 257
        mov r0, #1
        orr r0, r0, #256
@Set the value of r0 to 0x51f
        mov r0, #0x1f
        orr r0, r0, #0x500
@Set the value of r0 to 0xf100ffff
        mvn r0, #0xff, LSL 16
        bic r0, r0, #0xe, LSL 8
@ Set the value of r0 to 0x12341234
        mov r0, #0x8d, LSL 2
        orr r0, r0, #0x1, LSL 12
        add r0, r0, r0, LSL #16
@ shifter delay of 1 cycle

<>It is possible to load any 32-bit value into a register using a sequence of four instructions. With  Wireless MMX technology, two such 32-bit values can be generated in core registers, and then transferred to coprocessor registers using TCMR, TMCRR, and TBCST instructions.

Bit Field Manipulation.
Different encryption algorithms such as Data Encryption Standard (DES), Triple DES (T-DES), and hashing functions (SHA) perform many bit-manipulation operations.

The shift and logical operations of the XScale provide a useful way of manipulating bit fields. Bit field operations can be optimized using regular instructions:

@ Set the bit number specified by
@ r1 in register r0

mov     r2, #1
orr      r0, r0, r2, asl r1

@ Clear the bit number specified by
@ r1 in register r0

mov     r2, #1
bic       r0, r0, r2, asl r1

@ Extract the bit value of the bit
@ number specified by r1 of the
@ value in r0 storing the value in r0

mov     r1, r0, asr r1
and      r0, r1, #1

@ Extract the higher order 8 bits of the
@ value in r0 storing
@ the result in r1

mov     r1, r0, lsr #24

This approach helps other applications such as video stream parsing. Wireless MMX supports 64-bit-wide bit-wise manipulation - for instance, shift, and, or - which can be effectively used for different bit-wise algorithms.

Conclusion
The methods described in this series of articles are intended for assembly language development but can also be applied during development using intrinsic functions and in-line assembly.

High-level language programming styles based on these techniques have also been presented. These programming styles demonstrate how best to use different instructions and, more specifically, how the sequence of instructions should be scheduled to reduce stalls. However, the list of methods described here is not exhaustive.

Finally, a few points to remember are:

1) Use the correct precision for the algorithm, and choose instructions accordingly.
2) Interleave instructions between the pipe to hide result and issue latency.
3) Schedule load and stores with the correct data-addressing mode.
4) Watch out for load-to-use penalty and shifter-processing latency.
5) Count down on loops to reduce loop control overhead.
6) Use conditional instructions to avoid branch costs.

To read Part 1, go to "Microarchitectural optimization philosophy."
To read Part 2, go to "Optimization for data processing-oriented operations."

This series of articles was excerpted from "Programming with Intel Wireless MMX Technology," by Nigel Paver, Bradley Aldrich and Moinul Khan. Copyright © 2004 Intel Corporation. All rights reserved.

Nigel Paver is an architect and design manager for Wireless MMX technology at Intel Corporation. Bradley Aldrich is a leading authority at Intel Corporation on image and video processing. Moinul Khan is a multimedia architect at Intel Corporation.

1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :