Optimizing for execution speed
If you need to improve the speed of your program, you can attack the
problem at several levels of abstraction. First, make sure that the
code really needs to be accelerated. If you are dealing with a large
program, the part of the program using the most time may not be
obvious.
Profiling the program will help you find hot spots. Many systems are
built from multiple programs running as communicating processes; a slow
program may not actually limit the overall system performance.
You may be able to redesign your algorithm to improve efficiency.
Examining asymptotic performance is often a good guide to efficiency.
Doing fewer operations is usually the key to performance. In a few
cases, however, brute force may provide a better implementation. A
seemingly simple high-level language statement may in fact hide a very
long sequence of operations that slows down the algorithm.
Using dynamically allocated memory is one example, since managing
the heap takes time but is hidden from the programmer. For example, a
sophisticated algorithm that uses dynamic storage may be slower in
practice than an algorithm that performs more operations on statically
allocated memory.
Finally, you can look at the implementation of the program itself. A
few hints on program implementation are summarized below.
- Try to use registers efficiently. Group accesses to a value
together so that the value can be brought into a register and kept
there.
- Make use of page mode accesses in the memory system whenever
possible. Page mode reads and writes eliminate one step in the memory
access. You can increase use of page mode by rearranging your variables
so that more can be referenced contiguously.
- Analyze cache behavior to find major cache conflicts. Restructure
the code to eliminate as many of these as you can as follows:
- For instruction conflicts, if the offending code segment is
small, try to rewrite the segment to make it as small as possible so
that it better fits into the cache. Writing in assembly language may be
necessary. For conflicts across larger spans of code, try moving the
instructions or padding with NOPs.
- For scalar data conflicts, move the data values to different
locations to reduce conflicts.
- For array data conflicts, consider either moving the arrays or
changing your array access patterns to reduce conflicts.
Next in Part 8: Analysis and
optimization of energy, power amd program design
To read Part 1, go to "Program
design and analysis."
To read Part 2 , go to " Models
of program, assemblers and linkers."
To read Part 3, go to "Basic
Compilation Techniques"
To read Part 4, go to "The creation of
procedures"
To read Part 5, go to "Register
allocation and scheduling"
To read Part 6, go to "Analysis and
optimization of execution time"
Used with the permission of the
publisher, Newnes/Elsevier, this series of six articles is based on
copyrighted material from "Computers
as Components: Principles of Embedded Computer System Design" by
Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent
Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in
Embedded Computer Systems at Georgia Tech's School of Electrical and
Computer Engineering (ECE). Previously a professor of electrical
engineering at Princeton University, he worked at AT&T Bell
Laboratories. He has served as editor in chief of the ACM Transactions
on Embedded Computing and of Design
Automation for Embedded Systems.
References:
[Gho97] Somnath Ghosh,
Margaret Martonosi and Sharad Malik "
Cache
miss equations: an analytical representation of cache misses."
Proceedings of the 11th ACM International Conference on Supercomputing,
ACM Press, 1997
[Col97] Robet Collins,
"In-circuit emulation," Dr. Dobb's Journal, September, 1997. pp.
111-113.