You can implement the cyclic redundancy check function in an embedded systems design with minimal impact on memory or performance by using linear feedback shift register instructions more intelligently.
Implementing CRC
When implementing CRC in software, a common way of accelerating the calculations is through the use of lookup tables. This requires allocation of memory space that grows exponentially with increasing performance (generally, we need 2p rows of n bits each for p look ahead acceleration).
However, if you are building a system based on the Blackfin processor, you can take advantage of some instructions included in the basic Instruction Set Architecture (ISA) specifically defined for efficient LFSR implementation. These instructions--BXOR and BXORSHIFT--offer excellent performance, without a requirement for lookup tables. This fact makes it ideal for use when available memory is in shortage. Using these instructions, one can calculate an External CRC at a rate of two cycles per input bit. The example program in Listing 1 shows how to use these instructions.
This program consists of three functional parts. The first part is the initialization code. Here, the LFSR state (A0) and the mask (A1), which contains the coefficients of the generator polynomial, are initialized (Make sure to initialize the high byte of A1 as well). According to the definitions of an External LFSR as in Figure 2, A1's lsb is populated with the generator polynomial's high order coefficient. In the above example program, the 32-bit message fits into one register and is fed into the LFSR starting with the msb.
View the full-size image
The second part of the program is a loop of k iterations. In each iteration, one bit from the message is extracted and put in CC. Then, it is fed in and the LFSR state is updated according to the BXORSHIFT functional description (see the Blackfin instruction set documentation). That is, A0 is masked by A1, then the result is XOR-reduced and added to the input bit CC. The resulting bit is pushed into the lsb of A0 while the whole register is shifted to the left. In case the message is longer than 32-bits, a loop wrapping this part may be required, where a new data word is read into R2 from the message memory buffer.
In the third part of the program, we use the second variant of BXORSHIFT, this time without feedback. However, because in this variant the shift operation takes place prior to the XOR-reduction, we start by calculating the first output bit with a BXOR instruction operating on the last state. Then we continue with loop of n-1 iterations. In each iteration, A0 is updated and a new output is put in CC. Then, the CRC register (R0) is shifted left with the CC bit put into its lsb. At the end of this part, R0 contains the n bits of the CRC string.
Yaniv Sapir is a project leader at the Digital Media Technology Center in Analog Devices, developing embedded media algorithms and software for ADI's Blackfin processor. Yaniv holds a B.Sc. in aerospace engineering from the Technion, Israel Institute of Technology. He may be reached at yaniv.sapir@analog.com.
Yosef Stein serves as DSP principal system architect/advanced technologies manager in the Digital Media Technology Center working on the development of broadband communication and image compression enhanced instructions for Analog Devices' fixed point DSPs. Yosef holds a BSc in electrical engineering, Technion. He may be reached at yosi.stein@analog.com.