CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

The basics of programming embedded processors: Part 8
Analysis and Optimization of Energy, Power and program size



Embedded.com
In many cases, a microprocessor that supports the dense instruction set also supports the normal instruction set, although it is possible to build a microprocessor that executes only the dense instruction set. Special compilation modes produce the program in terms of the dense instruction set.

Program size of course varies with the type of program, but programs using the dense instruction set are often 70 to 80% of the size of the standard instruction set equivalents.

Another possibility is to use statistical compression algorithms to compress the program after assembly and then decompress it on the fly. Statistical compression views the data to be compressed as a stream of symbols.

It relies on the fact that not all symbols or sequences of symbols are equally probable, so that more probable sections of the data are encoded in a small number of bits while less probable sections are encoded with more bits. File compression programs such as gzip can often reduce a file to 50% of its uncompressed size.

However, file compression algorithms cannot be directly applied to dynamic code decompression. First, the algorithm must not force us to decompress the entire program before executing - that would defeat the purpose of code compression since the entire program would reside in memory. We must be able to decompress small sections of the program during execution.

This creates particular problems for branches, since the address in a branch refers to an uncompressed location, not the location of the branch target in the compressed program. Second, we must be able to decompress the block of code quickly to avoid slowing down the CPU excessively.

There are several techniques for code compression, but a common one [Wol92A] is illustrated in Figure 5-28 below. In this scheme, a block of compressed code is fetched from main memory and decompressed into the cache to satisfy a cache miss. Because decompression occurs only on cache misses, it usually occurs relatively infrequently, and the cache is already organized to handle blocks of code.

The code is compressed at the end of compilation after the file has been linked to form an executable; the compression process yields tables that program the decompression unit. Various compression algorithms can be used to generate the compressed code that can also yield fast hardware decompression logic. Application Example 5-1 below describes the code compression technique used in PowerPC.

Figure 5-28. Cache-based instruction decompression

Application Example 5-1: Code Compression for PowerPC
IBM has developed a code decompression module for PowerPC processors [Kem98]. Their decompression unit decompresses instructions after they have been fetched from main memory.

They found that they could achieve much higher compression ratios by breaking each 32-bit instruction into two 16-bit halves. IBM uses Huffman compression to compress each half, compressing instructions in blocks of 64 bytes.

An index table is used to translate uncompressed addresses, such as addresses found in branch instructions, to the new location of the instruction in the compressed code. They also add the K bit to the TLB to indicate whether a page in memory holds compressed instructions. This method compresses benchmark programs to 55 to 65% of their original size.

Next in Part 9: Program validation and testing.
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
To read Part 3, go to "Basic Compilation Techniques"
To read Part 4, go to "The creation of procedures"
To read Part 5, go to  "Register allocation and scheduling"
To read Part 6, go to "Analysis and optimization of execution time"
To read Part 7, go to "Trace-Driven Performance Analysis"

Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.

Wayne Wolf  is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.

References:
[Cat98] Francky Catthoor, et. al. "Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia Systems Design." Kluwer Academic Publishers, 1998.
[Kem98] T.M. Kemp et. al. " A decompression core for PowerPC "IBM Journal of Research and Development, November. 1998.
[Li98] Yanbing Li and Joerg Henkel, "A frame work for estimating and minimizing energy dissipation of embedded HW/SW systems." Proceedings, DAC 1998.
[Tiw94] Vivek Tiwari, Sharad Maalik and Andrfew Wolfe, "Power analysis of embedded software: A first stept toward software power minimization," IEEE Transations on VLSI Systems, December, 1994
[Wol92A] Andrew Wolfe and Alex Chanim, "Ececuting compression programs in an embedded RISC architecture, in Proceedins, 25th Annual International Simposium on Microarchitecture, 1992.
1 | 2 | 3 | 4

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :