Advanced Encryption Standard (AES)
AES is a block cipher that operates on input blocks of size 128 bits. A
block is stored in a 4x4 state array of 8-bit elements, and is
processed iteratively a specified number of rounds Nr. The pseudo code
in Figure 8 below shows the
cipher stage that processes the state array.
 |
| Figure
8 - The AES Cipher Stage |
SubBytes(state) is an operation in which the elements of the state
array are replaced according to a substitution table as shown in Figure 9 below.
 |
| Figure
9 - The AES SubBytes() Operation |
SubBytes() is typically implemented as a lookup from a 256-entry
table. A sample implementation is shown in Figure 10 below.
 |
| Figure
10 - A Trivial Implementation of AES' SubBytes() |
ShiftRows(state) is shown in Figure
11 below. If every four elements in a row were thought of as a
32-bit word, ShiftRows(state) can be implemented simply on 32-bit
architectures as a word rotate instruction.
 |
| Figure
11 - The AES ShiftRows() Operation |
In MixColumns(state), which is shown in Figure 12 below, every state column
is transformed through a matrix multiplication. The operations
(multiplications and additions) are performed in a finite field.
Addition in the Galois field 28 is achieved through an XOR of the
terms.
In AES' MixColumns(), only multiplications by {02} and by {03} take
place. In GF(28), multiplication of Sxx by {02}, denoted as ({02}.Sxx),
is achieved by a left shift of Sxx followed by a conditional XOR.
Multiplication in GF(28) by {03}, denoted as ({03}.Sxx), is achieved as
({02}.Sxx) + Sxx. Hence, multiplication by {03} is achieved through a
left shift of Sxx, a conditional XOR, and an XOR.
 |
| Figure
12 - The AES MixColumns() Operation |
Instead of applying the described sequence of shifts and XORs in
order to achieve the finite field multiplications, a better solution
may be to statically generate a table that contains all the possible
multiplication outcomes (no multiplication, multiplication by {02}, and
multiplication by {03}). Hence the multiplications in GF(28) are
achieved through table lookups.
Implementing the finite-field multiplications through table lookups
enables further optimization. Instead of having two separate table
lookups for SubBytes() and MixColumns(), one is sufficient.
A statically generated 1024-byte table, in which the four elements
({02}.SubBytes(Sxx)), SubBytes(Sxx), SubBytes(Sxx),
({03}.SubBytes(Sxx)) are indexed by each possible Sxx, may be used to
combine the SubBytes() and MixColumns() operations into one.
 |
| Figure
13 - A Lookup Table Based Implementation of AES' SubBytes(),
ShiftRows(), and MixColumns() |
Figure 13 above shows a
possible implementation based on table lookups. To make table indexing
simpler, the table used in Figure 13 contains 4 entries for each
possible byte in the state array. Those entries are
({02}.SubBytes(Sxx)), SubBytes(Sxx), SubBytes(Sxx), and
({03}.SubBytes(Sxx)).
Hence, the table used is 1024-bytes long and combines the SubBytes()
with the finite-field multiplications of MixColumns(). Furthermore, the
use of index with a small shift addressing mode enables the selection
of the appropriate elements in the state array to implicitly perform
the ShiftRows() operation.
Figure 14 below attempts to
illustrate how the implementation presented in Figure 13 combines
SubBytes(), ShiftRows(), and MixColumns() intelligently to achieve
rapid execution.
 |
| Figure
14 - Combining SubBytes(), ShiftRows() and MixColumns() into a Single
Operation |
Summary
Software implementations of the AES and Triple-DES symmetric-key
ciphers were developed for the Blackfin embedded processor.
Implementations that took advantage of the above described
optimizations and instruction selection techniques achieved, on
average, 200% improvement in run-time.
Therefore, while hardware-acceleration of cryptographic algorithms
has its applications, intelligent software "acceleration" can attain
excellent results and has its place in the development portfolio.
Wassim
Bassalee is a senior
applications engineer at Analog Devices Inc. His
interests include security, cryptography, and their applications in
embedded systems. Yosef Stein
serves as DSP principal system
architect/advanced technologies manager in the Digital Media Technology
Center working on the development of broadband communication and image
compression enhanced instructions for Analog Devices' fixed point DSPs.