AES is an open standard (see fips-197.pdf) that was selected from an open competition. The winner was the Rijndael algorithm because it combines an extremely high level of security with computational efficiency. The algorithm consists of Exclusive-OR functions combined with matrix operations and is a mathematically 'clean' design which avoids the risk of 'back doors' to unauthorized users.
The elegance and efficiency of the system makes it suitable for either hardware or software systems. Low data rates can be accomplished by software only solutions. Hardware solutions are, of course, much faster and are often specified because implementing the critical security components in hardware isolates them from software threats such as 'viruses'. This avoids the need to carry out a detailed and costly security analysis of all the software components in the system.
To achieve higher data throughput designers can use a SoC (ASIC) or FPGA platform to provide hardware acceleration. This is where another feature of AES comes into play, the scalability of the algorithm. Fig 1 gives a typical trade-off between throughput and equivalent ASIC gate count and exhibits a nearly linear relationship between complexity and data rate.

A low gate count design will use a narrow data width (down to 8 bits) and process each 128 bit sample through multiple cycles. A classic cost/performance trade-off is achieved by increasing the hardware resources to give a wider data path, which needs fewer cycles for a given throughput. Implementations using 32 bit data paths often offer an optimal trade-off because of the way the AES algorithm has been defined.
FPGA applications can exploit a similar trade-off. For example, a wireless application requiring 100 Mbps can be realized using a small core with 16 bit data width and a 70 MHz system clock. Optical networking needing 10 Gbps is achieved by increasing the data width to 128 bits, adding pipelining, and winding the clock up to 156 MHz. Trade-offs within this 100:1 range provide intermediate solutions that span the needs of military, broadcast, communications, and storage applications.

Technical considerations
Until recently, the choice between a SoC/ASIC or an FPGA implementation was usually clear-cut. Compared to ASIC solutions, FPGAs carry overheads that have an impact on technical and commercial performance. The programmable interconnect on an FPGA adds RC delays that reduce the performance compared to the custom metal of an ASIC. Additional transistors are needed by an FPGA to provide the programmability, but this raises the cost.
Over recent years, however, FPGAs have reduced these disadvantages and closed the gap significantly to the point where they are routinely used for volume production. A programmable solution is, without question, the fastest way to market and for this reason has become ubiquitous.
Much has been written about the total SoC project costs of mask sets, design time and tool suites for a 65 nm chip, with estimates starting at $5M and going stratospheric. Staggering costs like these eliminate all but a handful of designs, as witnessed by the dwindling number of ASIC starts. Recent reports (see the Soc Schedules EE Times article #206905136, for example) also suggest that close on 9 out of 10 projects overrun their deadlines, making long development schedules even worse.
That said; if the choice is for a masked solution because of unit cost or extreme performance requirements, then there is a wide selection from over 30 IP vendors. From Fig 1, the cost of the silicon for the AES function will be extremely low, with the largest design occupying only 1 or 2 cents worth of silicon. AES cores can be highly optimized for ASIC. What will be more significant are the engineering costs of integrating the function into the design and especially design verification. Here is where comprehensive test benches which cover all the "corner cases" start to pay off. Some applications may require a FIPS validation of the AES implementation, with the attendant risk of a respin.
When the target implementation technology is an FPGA, the choice of IP appears to be equally as wide. Many vendors offer netlists that can be input into the FPGA design flow. A faint warning bell should be ringing at this point unless the design has been specifically targeted at the FPGA architecture. The reason is that the ASIC world is different from the FPGA realm in subtle ways. An ASIC builds up functions that the designer specifies from a rich cell library and there is no need to consider the impact of implementation. In contrast, FPGAs have fixed resources onto which the design must map efficiently.
Small details such as using asynchronous resets rather than synchronous signals can have disproportionate impacts. The details of the memory design can have a large influence, as these are fixed and finite, and an unsympathetic design could double the resources consumed. This would not be significant if the costs per gate were similar to ASICs, but that is not the case (see the Commercial Considerations topic below). If the design is moderate performance and the production quantity is low, then any inefficiency can probably be accommodated. These architectural differences should have been taken into account if the design has been specifically built for FPGA implementation.
Even within designs built for FPGAs there can be large differences. You would expect vendor to vendor differences, but there can be surprises within vendor portfolios. For example, the premium to implement a data encrypter/decrypter over an encrypter-only design can range from a modest 10% with lots of resource sharing to around twice the size in the worst case.
Another significant variable relates to the key expansion system that is used in the AES process to encrypt the plain text. The algorithm to calculate these keys can be implemented in either the FPGA hardware or in software on an external processor. A software approach may be suitable for low throughput schemes with spare processing cycles, but this will not be the case for high performance and is counter-intuitive if you have decided to use hardware acceleration. However, the key expander can be resource intensive and can double the design size, although it is normally included in the resource estimates.
Vendor to vendor differences can be even more surprising. A comparison of data sheets for a similar configuration can result in significant throughput differences from similar resources. The explanation, (to some degree, at least) can be resolved by the data width used or the features provided.
Another variable that is worth considering is the clock frequency used to achieve the throughput. The relationship is given by:
Throughput (Mbps) = 128 * f/ number of cycles
... where the number of cycles depends on the key size and the data width and can range from 1 to over 600.
The engineering costs of a high clock speed are higher power consumption and more difficult timing closure. FIFOs and flow control of data may be required if the core runs at a different speed to the rest of the design, adding both cost and complexity.
Power consumption in FPGAs has become a major criterion for users in recent years because it affects the overall system cooling regime and costs, as well as raising concerns on reliability. FPGA vendors have successfully adopted design techniques to minimize quiescent power in larger devices (see This Web Page from Altera or This Web Page from Xilinx, for example). Fortuitously, one consequence of Moore's Law is that it has driven operating voltages down to 1 volt, but dynamic power will always be proportional to clock frequency, so a lower clock is better.
Verification is the number one headache in system design. AES was standardized by NIST with a number of different operating modes. NIST also provides a large number of 'known answer' test patterns and a specification for tests to be used in implementation validation. For validation, the test vectors are generated by a NIST approved test lab and are not known in advance. You can save quite a lot of work by selecting a vendor who offers a comprehensive test bench implementing all AESAVS tests, ideally with additional vectors as specified in FIPS197 and Special Publication SP800-38A.
Commercial considerations
These are best illustrated by example. Factors to consider include core cost, silicon cost, core support and modification costs, license restrictions, and the learning curve.
For example, Algotronix offers a range of AES cores, including a flagship 10 Gbps AES-GCM design. The core is very competitively priced, and "ticks all the boxes" in terms of being a compact design that delivers 10 Gbps from only a 156 MHz clock. So far, so good...
The next cost to consider is the silicon. Unlike an ASIC, efficiency of implementation has an impact on the total cost. For most systems it is valid to assume that there will be a need for an FPGA for interfacing or custom logic, and that the AES core can co-exist in this device. The Algotronix G3 core will fit into a Spartan or Virtex series FPGA from Xilinx or a Cyclone or Stratix device from Altera.
You should select a core that can be targeted at multiple FPGA families or vendors so that you have more flexibility to reduce silicon costs. The 100+ list price for Xilinx Spartan XC3S2000-4FG456C is $45.90. The resources for the core logic (32 bit datapath, ECB, Encrypt Only, 128 bit key, 'Offline' Hardware Key Expansion, 'Push Button' flow) provide a throughput of over 350 Mbps and require 1.3% of the FPGA. Therefore the cost of the silicon real-estate occupied by the core is only $0.57.
IP vendors typically provide customers with either HDL or netlist versions of their cores. It is not practical to trace how many FPGAs have been shipped with the IP, so expect no royalty payments. One "future proof" consideration is that at least one vendor allows the IP to be moved to ASIC at no additional cost. This provides an easy cost reduction path for successful products, or an easy way to prototype an ASIC solution.

To cover the range of options and modes supported in AES, customers can license and edit HDL code, but be aware that this is often at a steep price premium over a netlist. In addition to on-line support (ranging from 30 days to a year), most vendors offer a customization service where the exact functionality can be set.
The advantage of licensing HDL code becomes clear in two circumstances. The first example is where marketing change the specification late into the project. (What, this never happens in your company? The other advantage relates to learning curve and reuse issues. Engineers need to understand how things work, and it is much easier reading an HDL source than a netlist.
HDL code allows engineers to play out "what if" scenarios to arrive at the optimum design. For example, users can change VHDL generic parameters and recompile to evaluate trade-offs such as data path widths. They will also be expected to verify the core and create test vectors for the final product.
The final task (and the least exciting) is to document how the design works. To put the cost of the learning curve into perspective, assume that the annual cost of salary, benefits and tool costs for an engineer runs at $110k. Our experience at Algotronix is that customers are actively working within a day on "plain vanilla" cores, but we also offer a safety net for more demanding designs with consultancy options. Better still, customers can sign up to get a free-of-charge evaluation copy to convince them that it is the right core.
One special consideration for encryption IP relates to confidence that the security has not been compromised. A concern in a high security design is to ensure that so called "back door" features have not been maliciously included. It is important, therefore, to select a reputable vendor.
Purchasing untraceable source code without provenance or whose authors are anonymous should be avoided. It greatly reduces the risk that criminal hackers or a hostile intelligence agency has 'contributed' malware to an open source project. It is also risky to purchase encryption IP from a vendor in a country with a less-developed legal framework or one which has political disagreements with your own. Owning source code gives users the option of analyzing the design and archiving it.
Finally, a very important consideration is reuse. In reality, most customers will continue to include the security systems they develop in future products, because reuse of blocks has doubled over the last decade (see This Web Page for more details). The AES standards will not change, so choosing a vendor who offers multiple use or low cost extensions to the license can be a shrewd move. Larger companies will also look for either a site-wide license or one covering their whole company division so that they have the flexibility to operate efficiently.
Summary
A number of technical and commercial aspects have been discussed in this article, but there is no substitute for a full evaluation. The author hopes that this paper has helped to highlight some of the considerations when selecting an AES core.
Paul Dillien has worked in the semiconductor industry for over 30 years, including various Sales and Marketing roles working for Xilinx, Plessey and Ferranti.
More recently, Paul founded the high-technology marketing consultancy company High Tech Marketing. Paul can be contacted at paul@high-tech-marketing.co.uk.