Data Encryption Standard (DES and
Triple DES)
The structures of DES and Triple DES are similar. At the heart of both
is the function depicted in Figure 2
below. The E transformation expands the 32-bit input R
producing a 48-bit value E(R).
 |
| Figure
2 - The Critical Function in DES and Triple DES |
Figure 3 below shows the
bit-selection table, according to which, E expands the bits of R into
E(R). If, for example, the input R was equal to 0x00001000, then E(R)
would be equal to 0x0000000A0000. (Notice that the bit-selection table
refers to the leftmost bit in the input as bit 1 and refers to the
rightmost input bit as bit 32).
The 48-bit value E(R) is confused through an XOR operation with a
48-bit round-key K. The result, which is 48-bits long, is subdivided
into eight 6-bit long elements. Each of these eight elements is used to
index one of eight lookup tables, producing eight 4-bit long elements.
These eight 4-bit long elements are concatenated to form a 32-bit
value, which is then permuted to produce the 32-bit output.
 |
| Figure
3 - E Bit-Selection Table |
A trivial way of implementing the E transformation may be through
bit manipulations (AND, shift/rotate, and OR instructions). Figure 4 below shows such an
implementation.
 |
| Figure
4 - A Trivial Implementation of the E Transformation |
The implementation of E shown in Figure 4 is simple. For the common
case, it takes 3 Blackfin instructions (3 cycles) to produce 6 bits of
E(R). However, a better implementation may be one that makes use of the
Blackfin's EXTRACT instruction. The second argument of the EXTRACT
instruction is a 16-bit value.
The least significant byte of that value is the number of bits to
extract from the target register. The most significant byte of that
value is the offset within the target register to extract from.
For example, if the second argument to the EXTRACT instruction was
0x1A06, then the operation will extract 6 bits starting from offset
0x1A (i.e. the most significant 6 bits of a 32-bit register). An
implementation of E based on Blackfin's EXTRACT instruction is shown in
Figure 5 below.
 |
| Figure
5 - A Trivial Implementation of the E Transformation |
The last two lines in Figure 5 demonstrate that only two
instructions are necessary to produce 6 bits of E(R). Furthermore, the
two instructions may be issued in parallel, as shown in Figure 6 below, bringing the number
of cycles necessary to produce 6 bits of E(R) to 1.
 |
| Figure
6 - Parallel Execution of Instructions in the Implementation of E(R) |
Figure 2 earlier, which illustrates the critical function at the
heart of DES and Triple DES, shows that the 4-bit elements at the
output of the eight substitution boxes (S1 " S8) are concatenated
together to produce a 32-bit value that is then permuted to produce the
output.
Typically, the bit manipulations involved in concatenation and
permutation operations are not efficiently implemented in general
purpose embedded processors and DSPs. Concatenations are typically
performed using a shift and an OR. Permutations may either be performed
one bit at a time using shift and OR instructions or several bits at a
time using lookup tables.
A much more efficient implementation of DES may be one where the
substitution tables (S1 " S8) are replaced with (P(S1) " P(S8)). In
other words, the output of each lookup table P(Sx) produces the 32-bit
value that corresponds to the permutation of the 4-bit output of the
original lookup table Sx.
 |
| Figure
7 - Combine the Permutation and the Substitution in a Single Step |
In addition to this method's advantage of performing both the
substitution and the permutation in a single step, this method
eliminates the need to concatenate the eight 4-bit outputs of the
substitution tables (S1 " S8).
Instead, the final result is obtained through addition of the eight
32-bit outputs of the modified tables (P(S1) " P(S8)), which combine
the substitution and permutation operations together. Pictorially, this
is shows in Figure 7 above.