Accelerator Data Flow
Whether internal or external, low-level or high-level, the instruction and data flows for look-aside crypto accelerators are similar even if their performance and efficiency is not.
Figure 1 below shows a high-level block diagram of a PowerQUICC III processor to illustrate the position of a look-aside accelerator within an integrated communication processor.
 |
| Figure 1: PowerQUICC Processor with Look-Aside Crypto Accelerator |
Following sequentially in Figure 2 and Figure 3 are illustrations of a typical data flow for a high-level crypto accelerator such as the Freescale SEC. The processing steps are described below.
Step #1. A packet arrives at the Ethernet interface and is placed in a buffer in main memory. PowerQUICC-specific optimizations to this step include Ethernet interrupt coalescing and packet header stashing to the L2 cache.
Step #2. Upon notification (or discovery via polling) that a packet is available for processing, the CPU reads the packet header to perform classification. Classification involves software checking the header fields against various tables.
In this specific example, IPsec classification involves look-ups in two databases: a security policy database to determine whether the packet needs to be IPsec protected, and a security association database to determine the specific IPsec tunnel and parameters to use when encapsulating the packet.
Step #3. The CPU creates a descriptor for the security engine (SEC) that includes configuration information and pointers to the keys, context and data required for the cryptographic operation.
The amount of pre-processing the CPU performs on the packet before sending it to the crypto accelerator depends on the capabilities of the accelerator. Some accelerators perform crypto operations only. Other accelerators perform a level of protocol processing such as adding IPsec headers.
Step #4. The CPU writes a pointer to the descriptor to a SEC crypto-channel (DMA).
 |
| Figure 2: Look-Aside Security Architecture Steps 1-4 |
Step #5. The SEC fetches the descriptor from main memory.
Step #6. The SEC configures itself for single-pass processing per the descriptor and begins fetching keys, context and data from main memory. It writes decrypted data back to memory as it processes.
Step #7. The SEC notifies the CPU when the operation is complete (configurable options for notification by interrupt or polling bits.)
Step #8. The core performs touch-up formatting on the packet.
Step #9. The core creates a Tx buffer descriptor for the Ethernet interface.
Step #10. The Ethernet interface forwards the decrypted packet.
 |
| Figure 3. Look-Aside Security Architecture Steps 5"10 |
Look-aside architectures have become fairly prevalent in embedded processors for the following reasons:
They can be implemented cost-effectively because they leverage existing SoC platform resources including memory, classification resources and protocol state maintenance resources.
Software's ability to pre- and post-process the data, and to provide a wider range of crypto processing instructions, provides the flexibility required to support a variety of application and protocol use cases.
Although generally lower performance than a flow-through architecture, a look-aside accelerator provides sufficient performance for a broad range of applications.(All current PowerQUICC communications processors implement a look-aside architecture.)
Despite the co-location of security accelerators in the network interface modules of other communications processors, there are no unambiguous examples of flow-through processing in integrated communications processors. Consequently the remainder of this article will focus on variables impacting the performance of look-aside architectures.