In many cases, a microprocessor that supports the dense instruction set also supports the normal instruction set, although it is possible to build a microprocessor that executes only the dense instruction set. Special compilation modes produce the program in terms of the dense instruction set.
Program size of course varies with the type of program, but programs using the dense instruction set are often 70 to 80% of the size of the standard instruction set equivalents.
Another possibility is to use statistical compression algorithms to compress the program after assembly and then decompress it on the fly. Statistical compression views the data to be compressed as a stream of symbols.
It relies on the fact that not all symbols or sequences of symbols are equally probable, so that more probable sections of the data are encoded in a small number of bits while less probable sections are encoded with more bits. File compression programs such as gzip can often reduce a file to 50% of its uncompressed size.
However, file compression algorithms cannot be directly applied to dynamic code decompression. First, the algorithm must not force us to decompress the entire program before executing - that would defeat the purpose of code compression since the entire program would reside in memory. We must be able to decompress small sections of the program during execution.
This creates particular problems for branches, since the address in a branch refers to an uncompressed location, not the location of the branch target in the compressed program. Second, we must be able to decompress the block of code quickly to avoid slowing down the CPU excessively.
There are several techniques for code compression, but a common one [Wol92A] is illustrated in Figure 5-28 below. In this scheme, a block of compressed code is fetched from main memory and decompressed into the cache to satisfy a cache miss. Because decompression occurs only on cache misses, it usually occurs relatively infrequently, and the cache is already organized to handle blocks of code.
The code is compressed at the end of compilation after the file has been linked to form an executable; the compression process yields tables that program the decompression unit. Various compression algorithms can be used to generate the compressed code that can also yield fast hardware decompression logic. Application Example 5-1 below describes the code compression technique used in PowerPC.
 |
| Figure 5-28. Cache-based instruction decompression |
Application Example 5-1: Code Compression for PowerPC
IBM has developed a code decompression module for PowerPC processors [Kem98]. Their decompression unit decompresses instructions after they have been fetched from main memory.
They found that they could achieve much higher compression ratios by breaking each 32-bit instruction into two 16-bit halves. IBM uses Huffman compression to compress each half, compressing instructions in blocks of 64 bytes.
An index table is used to translate uncompressed addresses, such as addresses found in branch instructions, to the new location of the instruction in the compressed code. They also add the K bit to the TLB to indicate whether a page in memory holds compressed instructions. This method compresses benchmark programs to 55 to 65% of their original size.
Next in Part 9: Program validation
and testing.
To read Part 1, go to "Program
design and analysis."
To read Part 2 , go to " Models
of program, assemblers and linkers."
To read Part 3, go to "Basic
Compilation Techniques"
To read Part 4, go to "The creation of
procedures"
To read Part 5, go to "Register
allocation and scheduling"
To read Part 6, go to "Analysis and
optimization of execution time"
To read Part 7, go to "Trace-Driven
Performance Analysis"
Used with the permission of the
publisher, Newnes/Elsevier, this series of six articles is based on
copyrighted material from "Computers
as Components: Principles of Embedded Computer System Design" by
Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent
Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in
Embedded Computer Systems at Georgia Tech's School of Electrical and
Computer Engineering (ECE). Previously a professor of electrical
engineering at Princeton University, he worked at AT&T Bell
Laboratories. He has served as editor in chief of the ACM
Transactions
on Embedded Computing and of
Design
Automation for Embedded Systems.
References:
[Cat98] Francky Catthoor, et.
al. "Custom Memory Management Methodology: Exploration of Memory
Organization for Embedded Multimedia Systems Design." Kluwer Academic
Publishers, 1998.
[Kem98] T.M. Kemp et. al. " A
decompression core for PowerPC "IBM Journal of Research and
Development, November. 1998.
[Li98] Yanbing Li and Joerg
Henkel, "A frame work for estimating and minimizing energy dissipation
of embedded HW/SW systems." Proceedings, DAC 1998.
[Tiw94] Vivek Tiwari, Sharad Maalik and Andrfew Wolfe, "Power
analysis of embedded software: A first stept toward software power
minimization," IEEE Transations on VLSI Systems, December, 1994
[Wol92A]
Andrew Wolfe and Alex Chanim, "Ececuting compression programs in
an embedded RISC architecture, in Proceedins, 25th Annual International
Simposium on Microarchitecture, 1992.