CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Acceleration of Symmetric-Key Algorithms in Software



Embedded.com

Data Encryption Standard (DES and Triple DES)
The structures of DES and Triple DES are similar. At the heart of both is the function depicted in Figure 2 below. The E transformation expands the 32-bit input R producing a 48-bit value E(R).

Figure 2 - The Critical Function in DES and Triple DES

Figure 3 below shows the bit-selection table, according to which, E expands the bits of R into E(R). If, for example, the input R was equal to 0x00001000, then E(R) would be equal to 0x0000000A0000. (Notice that the bit-selection table refers to the leftmost bit in the input as bit 1 and refers to the rightmost input bit as bit 32).

The 48-bit value E(R) is confused through an XOR operation with a 48-bit round-key K. The result, which is 48-bits long, is subdivided into eight 6-bit long elements. Each of these eight elements is used to index one of eight lookup tables, producing eight 4-bit long elements. These eight 4-bit long elements are concatenated to form a 32-bit value, which is then permuted to produce the 32-bit output.

Figure 3 - E Bit-Selection Table

A trivial way of implementing the E transformation may be through bit manipulations (AND, shift/rotate, and OR instructions). Figure 4 below shows such an implementation.

Figure 4 - A Trivial Implementation of the E Transformation

The implementation of E shown in Figure 4 is simple. For the common case, it takes 3 Blackfin instructions (3 cycles) to produce 6 bits of E(R). However, a better implementation may be one that makes use of the Blackfin's EXTRACT instruction. The second argument of the EXTRACT instruction is a 16-bit value.

The least significant byte of that value is the number of bits to extract from the target register. The most significant byte of that value is the offset within the target register to extract from.

For example, if the second argument to the EXTRACT instruction was 0x1A06, then the operation will extract 6 bits starting from offset 0x1A (i.e. the most significant 6 bits of a 32-bit register). An implementation of E based on Blackfin's EXTRACT instruction is shown in Figure 5 below.

Figure 5 - A Trivial Implementation of the E Transformation

The last two lines in Figure 5 demonstrate that only two instructions are necessary to produce 6 bits of E(R). Furthermore, the two instructions may be issued in parallel, as shown in Figure 6 below, bringing the number of cycles necessary to produce 6 bits of E(R) to 1.

Figure 6 - Parallel Execution of Instructions in the Implementation of E(R)

Figure 2 earlier, which illustrates the critical function at the heart of DES and Triple DES, shows that the 4-bit elements at the output of the eight substitution boxes (S1 " S8) are concatenated together to produce a 32-bit value that is then permuted to produce the output.

Typically, the bit manipulations involved in concatenation and permutation operations are not efficiently implemented in general purpose embedded processors and DSPs. Concatenations are typically performed using a shift and an OR. Permutations may either be performed one bit at a time using shift and OR instructions or several bits at a time using lookup tables.

A much more efficient implementation of DES may be one where the substitution tables (S1 " S8) are replaced with (P(S1) " P(S8)). In other words, the output of each lookup table P(Sx) produces the 32-bit value that corresponds to the permutation of the 4-bit output of the original lookup table Sx.

Figure 7 - Combine the Permutation and the Substitution in a Single Step

In addition to this method's advantage of performing both the substitution and the permutation in a single step, this method eliminates the need to concatenate the eight 4-bit outputs of the substitution tables (S1 " S8).

Instead, the final result is obtained through addition of the eight 32-bit outputs of the modified tables (P(S1) " P(S8)), which combine the substitution and permutation operations together. Pictorially, this is shows in Figure 7 above.

1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :