With an efficient turbo-encoder algorithm, a multichannel 14.4-Mbps femtocell base station can be done on a single digital signal processor.
Simple encoding. In simple encoding, we code bit-by-bit. For each input bit, we output three bits. A sample C-code that simulates one RSC encoder is shown in Listing 1. Usually, the input data bits are accessed in terms of 8-bit bytes from memory as the minimum size of the data that the processor can access from memory is a byte (or an 8-bit quantity). Once we get a byte of data, to encode bit-by-bit, we have to unpack the bits, which takes about 1.125 cycle per bit (or totally nine cycles with first bit unpack requiring two cycles and rest of the bits requiring one cycle per bit using the Blackfin rot instruction). Then, an additional seven cycles are required to code this bit as coding involves only sequential operations. After coding, we have to pack the coded bits and store them in a memory for other processing and transmission. Packing of data takes the same number of cycles as unpack, in other words, 1.125 cycles per bit. Thus, total of 10 cycles are required for encoding one bit of data and output one parity bit (including overhead).
View the full-size image
Computational complexity. As we are using a look-up table for interleaver addresses instead of computing on the fly, we only spend cycles for look-up table accesses (which may come for free with compute operations). To interleave one data bit, it takes about three cycles (one cycle for loading offset, two cycles for computing absolute address). Since turbo encoding involves two RSC encoders and one interleave operation, which in sum we consume 25 cycles (including overhead) for encoding one data bit. For femtocell applications, with 14.4-Mbps bit rate, we require about 360 MIPS, about 60% MIPS of Blackfin processor.
Dealing with the turbo code
As we discussed earlier, a turbo encoder is a costly module at higher bit rates if we aren't implementing it properly. Next, we'll discuss techniques for efficient implementation of turbo encoder. We split the turbo encoder into two parts. In the first part, we deal with the encoding of bits; in the second part, we handle the interleaving of the data bits.
Encoding using look-up table. The turbo encoding with two RSC encoders consume about 20 cycles per input bit, as we explained earlier. We'll now describe a different approach using a look-up table that consumes only 2.5 cycles (for both encoders) per input bit. For this, we need 256 bytes of extra memory for storing look-up table data. Given the present state of RSC encoder, it's possible to encode more than one bit at a time using look-up table. By precomputing the look-up table for all possible combinations of input bits of length L and for all three combinations of state bits, we can encode L bits at a time. In this encoding, we use a look-up table that has 2L+3 entries. As the value of L increases, the size of look-up table also increases. With L = 4 (in other words, encoding of four bits at a time), we will have 27 or 128 entries in the look-up table as shown in Figure 2. Each entry contains four encoded bits and three bits of updated state information. In other words, a byte (or 8-bits) is sufficient to represent each entry of look-up table.
View the full-size image
In closely exploring the details of the 8-bit look-up table design, you can see that, to compute a 7-bit offset to the 128-entry look-up table from four input bits (say in register r0) and three current state bits (say in register r1), we have to extract (1-cycle) four data bits (say to register r2) from input byte (or from r0), we have to extract (1-cycle) current state (say to register r3) from look-up table output (or from r1) of previous encoding, we have to shift (1-cycle) three state bits by four (r3 = r3<<4) and OR (1-cycle) the extracted four input data bits to state bits (in other words, r4 = r2|r3). We can avoid the extract and shift operations for state bits by properly designing the look-up table. If we use two bytes for each look-up table entry and place the state bits in the shifted position as shown in the Figure 3, we can avoid two (saving 50%) of the offset calculation cycles.
View the full-size image