Direct memory access (DMA) technology has been around for more than 20 years. DMA has been used principally to offload memory accesses (reading and/or writing) from the CPU in order to enable the processor to focus on computational tasks and increase the performance of embedded and other system designs.
Traditionally, there have been many components with a DMA engine inside including microprocessors (CPUs), disk drive controllers, graphics processors and various end-points. The DMA engine in all of these devices is used to transfer data between memory and I/O devices without the involvement of the core central processing unit.
DMA is also used for intra-chip data transfer in the increasingly popular and widely used multi-core processors, especially in multiprocessor systems-on-chip applications. As will be shown later, its processing element is equipped with a local memory (often called scratchpad memory) and DMA is used for transferring data between the local memory and the main memory.
DMA is crucial for system input/output (I/O); without it, using programmed input/output (PIO) mode for communication with peripheral devices, or load/store instructions in the case of multi-core chips, the CPU typically is fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other, more crucial computational tasks.
With DMA, however, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. This is especially useful in real-time computing applications, in which it's critical the processor's primary job doesn't stall behind concurrent operations.
Another related application area can be found within various forms of stream processing where it is essential to have data processing and transfers in parallel, in order to achieve sufficient throughput.
DMA Engines Extend PCI Express Performance
Over the years, DMA has taken a broad range of forms in board- and system-level designs. DMA is used in almost all applications and markets – among them storage, servers, communications, embedded, and industrial.
Since the concept of faster data transfer is simple, all these applications use DMA engines in their systems. Up until now the DMA engines were either in the processors or the chipsets or the endpoints.
Designers have made it clear to PLX and other leading component providers that DMA needs to play a more critical role in embedded-systems' interconnect schemes.
So, a new and revolutionary concept of adding performance-enhancing DMA to a system now can be found in PCI Express (PCIe) switching devices featuring built-in DMA engines. DMA engines in a PCIe Gen 2 (5.0GT/s) switch provide more options for embedded-systems designers:
1. Some processors do not have DMA, so the DMA in a PCIe switch covers the needs of system designers who use such processors.
2. Some processors have limitations in DMA implementation – the DMA engines can perform write functions but cannot perform read functions. Since the DMA engine in a PCIe switch performs both the write and read functions, it can cover the needs of designers who use such processors.
3. Designers often are forced to use expensive and higher-power processors just for the DMA function, whereas a DMA engine built into a PCIe switch gives them more options to choose a cheaper, lower-power processor and use the DMA in a PCIe switch without compromising the price and quality of their system.
4. DMA built into a PCIe switch complements the DMA in a processor and/or end-point, providing higher performance for those designers who wish to differentiate their systems.
How DMA adds value in a PCIe switch system
Figure 1 below shows a PCIe switch with DMA in control planes. Unlike data planes, control planes are latency-sensitive rather than bandwidth-intensive. Since latency is of the utmost importance, control planes prefer components that help reduce latency as much as possible.
The particular PCIe switch shown in Figure 1 has 12 x1 downstream ports and one x4 upstream port. These downstream ports are connected to a large number of ASICs and FPGAs. The DMA engine is used to gather statistics from the ASICs and FPGAs and update them with the latest configuration details from the CPU.
Data in the control plane needs to be written as fast as possible, since configuration of devices is very important; as they need to be instructed on the action, they also need to be taking over the data passing through them.
A DMA engine built into a PCIe switch helps achieve this objective; without it, the control processor has to update each and every end-point, one after the other, which delays the updating process and negatively impacts the performance of the system.
The usefulness of DMA in a PCIe switch is more visible when the upstream port of the PCIe switch is x1 wide and there are 15 downstream ports connecting to FPGAs and ASICs.
|Figure 1. PCIe Gen 2 switch with DMA function in a control plane|
Figure 2 below shows a PCIe switch with DMA in an intelligent PCIe adapter card. The DMA engine transfers data from the intelligent I/O adapter cards while the non-transparency (NT) function in the same switch provides host-isolation.
If a PCIe switch without DMA was used in this application, the CPU would have to take over this function, the result of which is the performance of the system decreases as the CPU tries to manage the data transfers between the intelligent I/O adapter cards rather than focusing on the computational tasks of the entire system. Additionally, having both NT and DMA in the same switch provides designers with the advantage of using a single-chip solution for two very important functions in the adapter card.
|Figure 2. PCIe Gen 2 switch with DMA function in an intelligent PCIe adapter card.|
Figure 3 below shows a PCIe switch with DMA in a PCIe cluster. The DMA engine transfers data between servers, while the NT function in the same switch provides isolation between each server. Again, in this example, if a PCIe switch without DMA was used, the CPU on each server would have to take over this function for its own card and thus resulting in a performance drop as the CPU is trying to manage the data transfers between the servers rather than focusing on the computational tasks of the system.
|Figure 3. PCIe Gen 2 switch with DMA function in a PCIe cluster|
The new generation of PCIe Gen 2 switches with built-in engines that have emerged can support four channels of DMA each and transfer small and large blocks of data between all switch ports. Each DMA channel can saturate a x8 link at Gen 2 in one direction.
The DMA engine in these devices implements a descriptor ring approach. Each descriptor provides support for large transfer sizes (up to 128MB), giving the user the ability to perform very large data transfers in any direction (memory to device, device to device, memory to memory).
Descriptors can exist in host memory or, alternatively, inside the DMA switch. Up to 256 descriptors are supported internally in these PCIe switches, which also support 32-bit and 64-bit transfers as well as programmable quality of service (QoS).
These devices support multiple configurations and can be used in various applications, such as control planes in the networking and communications market, intelligent I/O adapter cards in the embedded market and in PCIe clusters.
DMA engines in a PCIe switch provide more options for designers when architecting a system. This enables embedded-systems designers to lower their costs, while at the same time increase performance and differentiate their products.
Krishna Mallampati is senior product marketing manager at PLX Technology , Sunnyvale, Calif. Previously, he worked as an applications engineer and product marketing manager at Lucent Technologies and Agere Systems, and was a senior product marketing engineer at Altera Corp. Mallampati holds a master's of science degree from Florida International University and a master's of business administration from Lehigh University.