Using Direct Memory Access effectively in media-based embedded applications: Part 2

Last time, in Part 1, we introduced somebasics behind Direct Memory Access (DMA) -why it's needed, and how it's structured and controlled. This time,we'll focus on the classifications of DMA transfers, and the constructsassociated with setting up these transactions.

Setting up a DMA
There are two main classes of DMA transfer configuration: Register Modeand Descriptor Mode. Regardless of the class of DMA, the same type ofinformation depicted in Table 1 below makes its way into the DMA controller.When the DMA runs in Register Mode, the DMA controller simply uses thevalues contained in the registers. In the case of Descriptor Mode, theDMA controller looks in memory for its configuration values.

Table1: DMA registers

Register-based DMA
In a register-based DMA, the processor directly programs DMA controlregisters to initiate a transfer. Register-based DMA provides the bestDMA controller performance, because registers don't need to keepreloading from descriptors in memory, and the core does not have tomaintain descriptors.

Register-based DMA consists of two sub-modes: “Autobuffer Mode” and”Stop Mode.” In Autobuffer DMA, when one transfer block completes, thecontrol registers automatically reload to their original setup values,and the same DMA process restarts, with zero overhead.

As we see in Figure 1 below ,if we set up an Autobuffer DMA to transfer some number of words from aperipheral to a buffer in L1 data memory, the DMA controller wouldreload the initial parameters immediately upon completion of the lastword transfer. This creates a “circular buffer,” because after a valueis written to the last location in the buffer, the next value will bewritten to the first location in the buffer.

Figure1: Implementing a circular buffer with Autobuffer DMA

Autobuffer DMA especially suits performance-sensitive applicationswith continuous data streams. The DMA controller can read in the streamindependent of other processor activities and then interrupt the corewhen each transfer completes. While it's possible to stop Autobuffermode gracefully, if a DMA process needs to be started and stoppedregularly, it doesn't make sense to use this mode.

An Autobuffer example
Consider an application where the processor operates on 512 audiosamples at a time, and the codec (the audio A/D converter) sends newdata at the audio clock rate. Autobuffer DMA is the perfect choice inthis scenario, because the data transfer occurs at such periodicintervals.

Drawing on this same model, let's assume we want to “double-buffer”the incoming audio data. That is, we want the DMA controller to fillone buffer while we operate on the other. The processor must finishworking on a particular data buffer before the DMA controller wrapsaround to the beginning of it, as shown in Figure 2, below . Using Autobuffermode, configuration is simple.

Figure2: Double buffering

The total count of the Autobuffer DMA must comprise the size of twodata buffers via a 2D DMA. In this example, each data buffer sizecorresponds to the size of the inner loop on a 2D DMA. The number ofbuffers corresponds to the outer loop. Therefore, we keep XCOUNT=512.Assuming the audio data element size is 4 bytes, we program the wordtransfer size to 32 bits and set XMODIFY=4.

Since we want two buffers, we set YCOUNT=2. If we want the twobuffers to be back-to-back in memory, we must set YMODIFY=1. However,it's often smarter to separate the buffers. This way, you can avoidconflicts between the processor and the DMA controller in accessing thesame sub-banks of memory. To this end, YMODIFY can be increased toprovide the proper separation between the buffers.

In a 2D DMA transfer, we have the option of generating an interruptwhen XCOUNT expires and/or when YCOUNT expires. Translated to thisexample, we can set the DMA interrupt to trigger every time XCOUNTdecrements to 0 (i.e., at the end of each set of 512 transfers). It iseasy to think of this in terms of receiving an interrupt at the end ofeach inner loop.

Stop Mode works identically to Autobuffer DMA, except registersdon't reload after DMA completes, so the entire DMA transfer takesplace only once. Stop Mode is most useful for one-time transfers thathappen based on some event ” for example, moving data blocks from onelocation to another in a non-periodic fashion. This mode is also usefulwhen you need to synchronize events. For example, if one task has tocomplete before the next transfer is initiated, Stop Mode can guaranteethis sequencing. Moreover, Stop Mode is useful for bufferinitialization.

Descriptor Models
DMA transfers that are descriptor-based require a set of parametersstored within memory to initiate a DMA sequence. The descriptorcontains all of the same parameters normally programmed into the DMAcontrol register set. However, descriptors also allow the chainingtogether of multiple DMA sequences. In descriptor-based DMA operations,we can program a DMA channel to automatically set up and start anotherDMA transfer after the current sequence completes. The descriptor-basedmodel provides the most flexibility in managing a system's DMAtransfers.

On Analog Devices Blackfinprocessors, there are two main descriptor models — a “DescriptorArray” scheme and a “Descriptor List” method. The goal of these twomodels is to allow a tradeoff between flexibility and performance.Let's take a look at how this is done.

In the Descriptor Array mode, descriptors reside in consecutivememory locations. The DMA controller still fetches descriptors frommemory, but because the next descriptor immediately follows the currentdescriptor, the two words that describe where to look for the nextdescriptor (and their correspondingdescriptor fetches ) aren'tnecessary. Because the descriptor does not contain this Next DescriptorPointer entry, the DMA controller expects a group of descriptors tofollow one another in memory like an array.

A Descriptor List is used when the individual descriptors are notlocated “back-to-back” in memory. There are actually multiple sub-modeshere, again to allow a tradeoff between performance and flexibility. Ina “small descriptor” model, descriptors include a single 16-bit fieldthat specifies the lower portion of the Next Descriptor Pointer field;the upper portion is programmed separately via a register and doesn'tchange. This, of course, confines descriptors to a specific 64K (=216)page in memory. When the descriptors need to be located across thisboundary, a “large” model is available that provides 32 bits for theNext Descriptor Pointer entry.

Regardless of the descriptor mode, using more descriptor valuesrequires more descriptor fetches. This is why Blackfin processorsspecify a “flex descriptor model” that tailors the descriptor length toinclude only what's needed for a particular transfer, as shown inFigure 3, below . For example,if 2D DMA is not needed, the YMODIFY and YCOUNTregisters do not need to be part of the descriptor block.

Figure3: DMA descriptor models.

Descriptor management
So what's the best way to manage a descriptor list? Well, the answer isapplication-dependent, but it is important to understand whatalternatives exist.

The first option we will describe behaves very much like anAutobuffer DMA. It involves setting up multiple descriptors that arechained together as shown in Figure4a, below . The term “chained” implies thatone descriptor points to the next descriptor, which is loadedautomatically. To complete the chain, the last descriptor points backto the first descriptor, and the process repeats. One reason to usethis technique rather than the Autobuffer mode is that descriptorsallow more flexibility in the size and direction of the transfers.

The second option involves the processor manually managing thedescriptor list. Recall that a descriptor is really a structure inmemory. Each descriptor contains a configuration word. Eachconfiguration word contains an “Enable” bit which can regulate when atransfer starts. Let's assume we have four buffers that have to movedata over some given task interval. If we need to have the processorstart each transfer specifically when the processor is ready, we canset up all of the descriptors in advance, but with the “Enable” bitscleared. When the processor determines the time is right to start adescriptor, it simply updates the descriptor in memory and then writesto a DMA register to start the stalled DMA channel. Figure 4b below shows anexample of this flow.

Figure4: DMA Descriptor throttled by the processor

When is this type of transferuseful?
Consider a multimedia application that involves synchronization of aninput stream to an output stream. For example, we may receive videosamples into memory at a rate that is different than the rate at whichwe display output video. This will happen in real systems even when youattempt to make the streams run at exactly the same clock rate. Incases where synchronization is an issue, the processor can manuallyregulate the DMA descriptors corresponding to the output buffer. Beforethe next descriptor is enabled, the processor can synchronize thestream by adjusting the current output descriptor via a semaphoremechanism that guarantees only one entity at a time accesses the sharedresource.

When using internal DMA descriptor chains or DMA-based streamsbetween processors, it can also be useful to add an extra word at theend of the transferred data block that helps identify the packet beingsent, including information on how to handle the data and, possibly, atime stamp. The dashed area of Figure 4b shows an example of thisscheme.

Most sophisticated applications have a “DMA Manager” functionimplemented in software. This may be provided as part of an operatingsystem or real-time kernel, but it can also run without either ofthese. On Blackfin processors, this function is provided as part of theSystem Services in the VisualDSP++tool suite. This management functionallows you to move data via a standard API, without having to configureevery control register manually.

Basically, an application submits DMA descriptor requests to theDMA Queue Manager, whose responsibility it is to handle each request.These requests are handled in the order they are received by theapplication software. Usually, an address pointer to a “call-back”function is part of the system as well. This function carries out thework you want the processor to perform when a data buffer is ready,without needlessly making the core linger in a high-priority interruptservice routine. In sum, the DMA manager can simplify the programmingmodel because it abstracts data transfers.

There are two general methods for managing a descriptor queue usinginterrupts. The first is based on interrupting upon the completion ofevery descriptor. Use this method only if you can guarantee that eachinterrupt event will be serviced separately, with no interrupt overrun.The second involves interrupting only on completion of the worktransfer specified by the last descriptor of a work block. A work blockis a collection of one or more descriptors.

To maintain synchronization of the descriptor queue, thenon-interrupt software has to maintain a count of descriptors added tothe queue, while the interrupt handler maintains a count of completeddescriptors removed from the queue. The counts are then equal only whenthe DMA channel pauses after having processed all the descriptors.

In this article we discussed DMAdata flow structures ” register-based and descriptor-based ” and whento use each type. Next time in Part 3, we'll look at some advanced DMAfeatures that assist with moving data effectively in multimedia systems.

To read Part 1 in this four part series, go to”Thebasics of direct memory access.

Thisseries of four articles is based on material from “ EmbeddedMedia Processing,” by David Katzand Rick Gentile, published byNewnes/Elsevier .

<>Rick Gentile and David Katz aresenior DSP applications engineers inthe Blackfin Applications Group at AnalogDevices, Inc.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.