Using Direct Memory Access effectively in media-based embedded applications: Part 2Last time, in Part 1, we introduced some basics behind Direct Memory Access (DMA) - why it's needed, and how it's structured and controlled. This time, we'll focus on the classifications of DMA transfers, and the constructs associated with setting up these transactions.
Setting up a DMA
There are two main classes of DMA transfer configuration: Register Mode and Descriptor Mode. Regardless of the class of DMA, the same type of information depicted in Table 1 below makes its way into the DMA controller. When the DMA runs in Register Mode, the DMA controller simply uses the values contained in the registers. In the case of Descriptor Mode, the DMA controller looks in memory for its configuration values.
|Table 1: DMA registers|
In a register-based DMA, the processor directly programs DMA control registers to initiate a transfer. Register-based DMA provides the best DMA controller performance, because registers don't need to keep reloading from descriptors in memory, and the core does not have to maintain descriptors.
Register-based DMA consists of two sub-modes: "Autobuffer Mode" and "Stop Mode." In Autobuffer DMA, when one transfer block completes, the control registers automatically reload to their original setup values, and the same DMA process restarts, with zero overhead.
As we see in Figure 1 below, if we set up an Autobuffer DMA to transfer some number of words from a peripheral to a buffer in L1 data memory, the DMA controller would reload the initial parameters immediately upon completion of the last word transfer. This creates a "circular buffer," because after a value is written to the last location in the buffer, the next value will be written to the first location in the buffer.
|Figure 1: Implementing a circular buffer with Autobuffer DMA|
Autobuffer DMA especially suits performance-sensitive applications with continuous data streams. The DMA controller can read in the stream independent of other processor activities and then interrupt the core when each transfer completes. While it's possible to stop Autobuffer mode gracefully, if a DMA process needs to be started and stopped regularly, it doesn't make sense to use this mode.An Autobuffer example
Consider an application where the processor operates on 512 audio samples at a time, and the codec (the audio A/D converter) sends new data at the audio clock rate. Autobuffer DMA is the perfect choice in this scenario, because the data transfer occurs at such periodic intervals.
Drawing on this same model, let's assume we want to "double-buffer" the incoming audio data. That is, we want the DMA controller to fill one buffer while we operate on the other. The processor must finish working on a particular data buffer before the DMA controller wraps around to the beginning of it, as shown in Figure 2, below. Using Autobuffer mode, configuration is simple.
|Figure 2: Double buffering|
The total count of the Autobuffer DMA must comprise the size of two
data buffers via a 2D DMA. In this example, each data buffer size
corresponds to the size of the inner loop on a 2D DMA. The number of
buffers corresponds to the outer loop. Therefore, we keep XCOUNT=512.
Assuming the audio data element size is 4 bytes, we program the word
transfer size to 32 bits and set XMODIFY=4.
Since we want two buffers, we set YCOUNT=2. If we want the two buffers to be back-to-back in memory, we must set YMODIFY=1. However, it's often smarter to separate the buffers. This way, you can avoid conflicts between the processor and the DMA controller in accessing the same sub-banks of memory. To this end, YMODIFY can be increased to provide the proper separation between the buffers.
In a 2D DMA transfer, we have the option of generating an interrupt when XCOUNT expires and/or when YCOUNT expires. Translated to this example, we can set the DMA interrupt to trigger every time XCOUNT decrements to 0 (i.e., at the end of each set of 512 transfers). It is easy to think of this in terms of receiving an interrupt at the end of each inner loop.
Stop Mode works identically to Autobuffer DMA, except registers don't reload after DMA completes, so the entire DMA transfer takes place only once. Stop Mode is most useful for one-time transfers that happen based on some event " for example, moving data blocks from one location to another in a non-periodic fashion. This mode is also useful when you need to synchronize events. For example, if one task has to complete before the next transfer is initiated, Stop Mode can guarantee this sequencing. Moreover, Stop Mode is useful for buffer initialization.
DMA transfers that are descriptor-based require a set of parameters stored within memory to initiate a DMA sequence. The descriptor contains all of the same parameters normally programmed into the DMA control register set. However, descriptors also allow the chaining together of multiple DMA sequences. In descriptor-based DMA operations, we can program a DMA channel to automatically set up and start another DMA transfer after the current sequence completes. The descriptor-based model provides the most flexibility in managing a system's DMA transfers.
On Analog Devices Blackfin processors, there are two main descriptor models -- a "Descriptor Array" scheme and a "Descriptor List" method. The goal of these two models is to allow a tradeoff between flexibility and performance. Let's take a look at how this is done.
In the Descriptor Array mode, descriptors reside in consecutive memory locations. The DMA controller still fetches descriptors from memory, but because the next descriptor immediately follows the current descriptor, the two words that describe where to look for the next descriptor (and their corresponding descriptor fetches) aren't necessary. Because the descriptor does not contain this Next Descriptor Pointer entry, the DMA controller expects a group of descriptors to follow one another in memory like an array.
A Descriptor List is used when the individual descriptors are not located "back-to-back" in memory. There are actually multiple sub-modes here, again to allow a tradeoff between performance and flexibility. In a "small descriptor" model, descriptors include a single 16-bit field that specifies the lower portion of the Next Descriptor Pointer field; the upper portion is programmed separately via a register and doesn't change. This, of course, confines descriptors to a specific 64K (=216) page in memory. When the descriptors need to be located across this boundary, a "large" model is available that provides 32 bits for the Next Descriptor Pointer entry.
Regardless of the descriptor mode, using more descriptor values requires more descriptor fetches. This is why Blackfin processors specify a "flex descriptor model" that tailors the descriptor length to include only what's needed for a particular transfer, as shown in Figure 3, below. For example, if 2D DMA is not needed, the YMODIFY and YCOUNT registers do not need to be part of the descriptor block.
|Figure 3: DMA descriptor models.|
So what's the best way to manage a descriptor list? Well, the answer is application-dependent, but it is important to understand what alternatives exist.
The first option we will describe behaves very much like an Autobuffer DMA. It involves setting up multiple descriptors that are chained together as shown in Figure 4a, below. The term "chained" implies that one descriptor points to the next descriptor, which is loaded automatically. To complete the chain, the last descriptor points back to the first descriptor, and the process repeats. One reason to use this technique rather than the Autobuffer mode is that descriptors allow more flexibility in the size and direction of the transfers.
The second option involves the processor manually managing the descriptor list. Recall that a descriptor is really a structure in memory. Each descriptor contains a configuration word. Each configuration word contains an "Enable" bit which can regulate when a transfer starts. Let's assume we have four buffers that have to move data over some given task interval. If we need to have the processor start each transfer specifically when the processor is ready, we can set up all of the descriptors in advance, but with the "Enable" bits cleared. When the processor determines the time is right to start a descriptor, it simply updates the descriptor in memory and then writes to a DMA register to start the stalled DMA channel. Figure 4b below shows an example of this flow.
|Figure 4: DMA Descriptor throttled by the processor|
When is this type of transfer
Consider a multimedia application that involves synchronization of an input stream to an output stream. For example, we may receive video samples into memory at a rate that is different than the rate at which we display output video. This will happen in real systems even when you attempt to make the streams run at exactly the same clock rate. In cases where synchronization is an issue, the processor can manually regulate the DMA descriptors corresponding to the output buffer. Before the next descriptor is enabled, the processor can synchronize the stream by adjusting the current output descriptor via a semaphore mechanism that guarantees only one entity at a time accesses the shared resource.
When using internal DMA descriptor chains or DMA-based streams between processors, it can also be useful to add an extra word at the end of the transferred data block that helps identify the packet being sent, including information on how to handle the data and, possibly, a time stamp. The dashed area of Figure 4b shows an example of this scheme.
Most sophisticated applications have a "DMA Manager" function implemented in software. This may be provided as part of an operating system or real-time kernel, but it can also run without either of these. On Blackfin processors, this function is provided as part of the System Services in the VisualDSP++ tool suite. This management function allows you to move data via a standard API, without having to configure every control register manually.
Basically, an application submits DMA descriptor requests to the DMA Queue Manager, whose responsibility it is to handle each request. These requests are handled in the order they are received by the application software. Usually, an address pointer to a "call-back" function is part of the system as well. This function carries out the work you want the processor to perform when a data buffer is ready, without needlessly making the core linger in a high-priority interrupt service routine. In sum, the DMA manager can simplify the programming model because it abstracts data transfers.
There are two general methods for managing a descriptor queue using interrupts. The first is based on interrupting upon the completion of every descriptor. Use this method only if you can guarantee that each interrupt event will be serviced separately, with no interrupt overrun. The second involves interrupting only on completion of the work transfer specified by the last descriptor of a work block. A work block is a collection of one or more descriptors.
To maintain synchronization of the descriptor queue, the non-interrupt software has to maintain a count of descriptors added to the queue, while the interrupt handler maintains a count of completed descriptors removed from the queue. The counts are then equal only when the DMA channel pauses after having processed all the descriptors.
In this article we discussed DMA data flow structures " register-based and descriptor-based " and when to use each type. Next time in Part 3, we'll look at some advanced DMA features that assist with moving data effectively in multimedia systems.
To read Part 1 in this four part series, go to "The basics of direct memory access."
This series of four articles is based on material from "Embedded Media Processing," by David Katz and Rick Gentile, published by Newnes/Elsevier .<>Rick Gentile and David Katz are senior DSP applications engineers in the Blackfin Applications Group at Analog Devices, Inc.