CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Code techniques for processor pipeline optimization: Part 2
Optimization for data processing operations



Embedded.com

Scheduling Load and Store Multiple (LDM/STM)
Load and store multiple are two instructions - LDM and STM - that can be used to load a set of core registers. These instructions are often used for saving and retrieving the state of the processor.

LDM and STM instructions have an issue latency of 2 to 20 cycles, depending on the number of registers being loaded or stored. The issue latency is typically two cycles plus an additional cycle for each of the registers loaded or stored, assuming a data cache hit.

The instruction following an LDM stalls whether or not this instruction depends on the results of the load. While these instructions are useful to ease code development, they have two drawbacks: they have a two-cycle delay of issue latency and they are not used for loading and storing registers that support Wireless MMX technology.

Optimizing Align and Shift
The auxiliary registers are designed to hold constants that are invariant across the lifetime of an inner loop calculation. For this reason, values loaded into the auxiliary registers are not forwarded to data operations.

The intended use of the registers is that the shift or alignment offset is loaded into a wCGRn register before the main loop is entered, and then the shift to alignment offset is used repeatedly inside the loop without change.

If the value in a wCGRx register is changed and an instruction immediately afterward tries to use the loaded value, then the coprocessor stalls until the loaded value has reached the control register file.

For most kernels, the alignment values and shift amount values do not change during the execution of the kernel. For example, consider an algorithm that accesses a large data array where each element has 16-bit accuracy and has been stored in a packed fashion in the memory.

Using Wireless MMX technology, four elements of this data array can be processed concurrently. If the data structure is aligned at a 64-bit boundary, Intel Wireless MMX technology can access the data by a simple WLDRD instruction. For instance:

wldrd     wR0, [r1],#8
.. use   wR0   now ..

However, if the data is not aligned to a 64-bit boundary, it will be necessary to perform alignment. For the unaligned case, the data segment can be from a 64-bit boundary by an amount of one to seven bytes.

The last three bits of the pointer's address can determine the exact offset. Be aware that the misalignment for successive double words does not change throughout the array. You can keep the misalignment constant stored in a control register and perform alignment on successive accesses.

bic     r1, r2, #7 @ r1 gets aligned address
xor     r0, r2, r1 @ r0 now contains misalignment

tmcr wCGR0, r0 @ WCGR0 now gets misalignment
wldrd wR0, [r1],#8
wldrd wR1, [r1],#8
..
..
waligni wR2, wR0, wR1, #0
.. use  wR2  now..

Similarly, control registers can be used to determine a shift amount. Some algorithms require a certain level of accuracy?range and precision?during the computation.

Following any multiplication or accumulation, you need to use a right shift of the resultant value. This correction can be maintained easily by using a control register-based shift operation.

Next in Part 3: "Optimization for Control-oriented operations."
To read Part 1, go to "Microarchitectural optimization philosophy."

This series of articles was excerpted from "Programming with Intel Wireless MMX Technology," by Nigel Paver, Bradley Aldrich and Moinul Khan. Copyright © 2004 Intel Corporation. All rights reserved.

Nigel Paver is an architect and design manager for Wireless MMX technology at Intel Corporation. Bradley Aldrich is a leading authority at Intel Corporation on image and video processing. Moinul Khan is a multimedia architect at Intel Corporation.

1 | 2 | 3 | 4 | 5

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :