CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Acceleration of Symmetric-Key Algorithms in Software



Embedded.com

Advanced Encryption Standard (AES)
AES is a block cipher that operates on input blocks of size 128 bits. A block is stored in a 4x4 state array of 8-bit elements, and is processed iteratively a specified number of rounds Nr. The pseudo code in Figure 8 below shows the cipher stage that processes the state array.

Figure 8 - The AES Cipher Stage

SubBytes(state) is an operation in which the elements of the state array are replaced according to a substitution table as shown in Figure 9 below.

Figure 9 - The AES SubBytes() Operation

SubBytes() is typically implemented as a lookup from a 256-entry table. A sample implementation is shown in Figure 10 below.

Figure 10 - A Trivial Implementation of AES' SubBytes()

ShiftRows(state) is shown in Figure 11 below. If every four elements in a row were thought of as a 32-bit word, ShiftRows(state) can be implemented simply on 32-bit architectures as a word rotate instruction.

Figure 11 - The AES ShiftRows() Operation

In MixColumns(state), which is shown in Figure 12 below, every state column is transformed through a matrix multiplication. The operations (multiplications and additions) are performed in a finite field. Addition in the Galois field 28 is achieved through an XOR of the terms.

In AES' MixColumns(), only multiplications by {02} and by {03} take place. In GF(28), multiplication of Sxx by {02}, denoted as ({02}.Sxx), is achieved by a left shift of Sxx followed by a conditional XOR. Multiplication in GF(28) by {03}, denoted as ({03}.Sxx), is achieved as ({02}.Sxx) + Sxx. Hence, multiplication by {03} is achieved through a left shift of Sxx, a conditional XOR, and an XOR.

Figure 12 - The AES MixColumns() Operation

Instead of applying the described sequence of shifts and XORs in order to achieve the finite field multiplications, a better solution may be to statically generate a table that contains all the possible multiplication outcomes (no multiplication, multiplication by {02}, and multiplication by {03}). Hence the multiplications in GF(28) are achieved through table lookups.

Implementing the finite-field multiplications through table lookups enables further optimization. Instead of having two separate table lookups for SubBytes() and MixColumns(), one is sufficient.

A statically generated 1024-byte table, in which the four elements ({02}.SubBytes(Sxx)), SubBytes(Sxx), SubBytes(Sxx), ({03}.SubBytes(Sxx)) are indexed by each possible Sxx, may be used to combine the SubBytes() and MixColumns() operations into one.


Figure 13 - A Lookup Table Based Implementation of AES' SubBytes(), ShiftRows(), and MixColumns()

Figure 13 above shows a possible implementation based on table lookups. To make table indexing simpler, the table used in Figure 13 contains 4 entries for each possible byte in the state array. Those entries are ({02}.SubBytes(Sxx)), SubBytes(Sxx), SubBytes(Sxx), and ({03}.SubBytes(Sxx)).

Hence, the table used is 1024-bytes long and combines the SubBytes() with the finite-field multiplications of MixColumns(). Furthermore, the use of index with a small shift addressing mode enables the selection of the appropriate elements in the state array to implicitly perform the ShiftRows() operation.

Figure 14 below attempts to illustrate how the implementation presented in Figure 13 combines SubBytes(), ShiftRows(), and MixColumns() intelligently to achieve rapid execution.

Figure 14 - Combining SubBytes(), ShiftRows() and MixColumns() into a Single Operation

Summary
Software implementations of the AES and Triple-DES symmetric-key ciphers were developed for the Blackfin embedded processor. Implementations that took advantage of the above described optimizations and instruction selection techniques achieved, on average, 200% improvement in run-time.

Therefore, while hardware-acceleration of cryptographic algorithms has its applications, intelligent software "acceleration" can attain excellent results and has its place in the development portfolio.

Wassim Bassalee is a senior applications engineer at Analog Devices Inc. His interests include security, cryptography, and their applications in embedded systems. Yosef Stein serves as DSP principal system architect/advanced technologies manager in the Digital Media Technology Center working on the development of broadband communication and image compression enhanced instructions for Analog Devices' fixed point DSPs.

1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :