LZW Data Compression
Lempel"Ziv"Welch (LZW) is a lossless data compression technique that was created back in 1984 by Terry Welsh as an improvement to the popular LZ77 compression algorithm.
LZW is one of the most popular compression programs available and is the easiest dictionary-based algorithm to implement into an embedded design. It is the compression technique used to generate TIFF and GIF files.
It is also the compression technique used for UNIX's Compress program. LZW was also the compression technique used by computer programs that claimed to "double your hard drive's capacity!"
At the heart of the LZW compression program is the dictionary (Figure 1, below), which translates string patterns from the original data into a single token. The goal of the dictionary is to replace redundant data strings with single value tokens.
 |
| Figure 1. A dictionary is at the heart of any LZW compression program. |
Unlike other dictionary-based programs, the real elegance of the LZW compression algorithm lies in the fact that the dictionary does not have to be attached and transmitted with the compressed data itself. The dictionary, which is dynamically constructed in RAM memory, is simply discarded after the compression process has completed.
The same elegance can be found in the LZW decompression algorithm, which dynamically re-constructs the dictionary as it converts the data back to its original form.
The LZW compression algorithm is designed to input data, accumulate it into a string, generate a dictionary that dynamically assigns tokens, and to output those tokens into a compressed format.
As each data value is input and concatenated to the string, the dictionary is searched. As long as the string is found in the dictionary, the process of concatenating another data value into the string will continue. Eventually, the next data value added to the string causes the search in the dictionary to fail.
At this point, the compress program outputs the dictionary token from the previous string to the compressed data output and generates a new token value for the newly created string. Figure 2 below illustrates this process in detail.
 |
| Figure 2. As shown in this flowchart, the LZW compression algorithm is designed to input data, accumulate it, generate a dictionary that assigns tokens and outputs them into a compressed format. |
The compression algorithm contains two variables, STRING and CHAR. The variable CHAR contains a single character from the input data stream. The variable STRING is an array of variable length strings from the input data. The STRING array is stored in the dictionary and used by the algorithm to search for recurring strings in the input data stream.
At the top of the flowchart, (box #1) reads in the first byte from the input data stream and places it into the STRING array. The next byte in the data stream is input and placed in CHAR (box #2).
The string array containing (STRING + CHAR) is searched in the dictionary. If it is not found, the token for STRING is output to the compressed data buffer (box #5). Next, (box #6) adds (STRING + CHAR) to the dictionary and assigned a token to it. Then (box #7) places CHAR into STRING. If (STRING + CHAR) is found in the dictionary, CHAR is concatenated into STRING (box #4) and the process begins at (box #2).
This process continues until all the data has been brought into the compression program. The dictionary is then flushed of all the strings that were constructed during the compression process.