As embedded processors become more computationally capable, many new(and moreadvanced) algorithms can be ported, which in turn enable newapplications. The mostflexible architectures scale from low-end to high-end applications,enabling a commondevelopment platform across projects as well as providing moreflexibility fordevelopment teams.
One way processor vendors provide the desired scalability with asingle architecture is toinclude both single- and dual-core platforms. The goal with amulti-core processor is toallow nearly ideal scaling without overcomplicating the programmingmodel. Forexample, in a dual-core system, the goal is to achieve as close to a 2xperformanceincrease as possible.
In this paper, we will discuss the most common programmingtechniques for maximizingperformance, as well as some system-related topics that commonly arisewhen porting toa dual-core processor.
Asymmetric and symmetric dual-coreprocessors
A processor with two cores can be pretty powerful, yet along with theextra performancecan come an added measure of complexity. There are a few common andquite usefulprogramming models that suit a dual-core processor, and we'll examinethem here.There are two types of dual-core architectures available today.
The first is an “asymmetric” dual-core processor, meaning that thetwo cores are architecturally different. This means that, in additionto possessing different instruction sets, they also run at differentoperating frequencies and have different memory and programming models.
The main advantage of having two different architectures in the samephysical package isthat each core can optimize a specific portion of the processing task.For example, onecore might excel at controller functionality, while the second onemight target higher bandwidth processing.
There are several disadvantages with asymmetric arrangements. Forone, they require twosets of development tools and two sets of programmers in order to buildan application.Secondly, “free” (i.e., unused) processing resources on one processorare of little use to afully loaded second processor, since their competencies are sodivergent.
What's more, asymmetric processors make it difficult to scale fromlight to heavy processing profiles. This is important, for instance, inbattery-operated devices, where frequency and voltage may be adjustedto meet the present processing requirements; asymmetric cores don'tscale well because the processing load is divided unevenly, so that onecore might still need to run at maximum frequency while the other couldrun at a much lower clock rate.
Finally, as we will see, asymmetric processors don't support manydifferentprogramming models, a trait that limits design options. In contrast tothe asymmetric processor, a symmetric dual-core processor (extended to”symmetric multiprocessor,” or “SMP”) consists of two identical coresintegrated into a single package.
An SMP requires only a single set of development tools and a designteam with a single architectural knowledge base. Also, since both coresare equivalent, unused processing resources on one core can often beleveraged by the other core.
Another very important benefit of the SMP architecture is the factthat frequency andvoltage can be more easily modified together, improving the overallenergy usage in agiven application.
Lastly, while the symmetric processor supports an asymmetricprogramming model, it also supports many other models that are veryuseful for multimedia applications. The main challenge with thesymmetric multiprocessor is splitting an algorithm acrosstwo processors without complicating the programming model.
There are several basic programming models that designers employtoday across a broadrange of applications. We described an asymmetric processor earlier –we will now lookat its associated programming model.
Asymmetric programming model.
The traditional use of an asymmetric dual-core processor involvesdiscrete and often different tasks running on each of the cores, asshown in Figure 1, below .
|Figure1: Asymmetric Programming Model|
For example, one of the cores may be assigned all of thecontrol-related tasks. These typically include graphics and overlayfunctionality, as well as networking stacks and overall flow control.This core is also most often where the operating system or kernel willreside.
Meanwhile, the second core can be dedicated to the high intensityprocessing functions ofthe application. For example, compressed data may come over the networkinto the firstcore. Received packets can feed the second core, which in turn mightperform someaudio and video decode function.
In this model, the two processors are independent from each other.Logically, they aremore like two stand-alone processors that communicate through theinterconnectstructures between them. They don't share any code and share verylittle data.
This model is preferred by developers who employ separate teams intheir softwaredevelopment efforts. The ability to allocate different tasks todifferent processors allowsdevelopment to be accomplished in parallel, eliminating potentialcritical pathdependencies in the project.
This programming model also aids the testing and validation phasesof the project. For example, if code changes on one core, it does notnecessarily invalidate testing efforts already completed on the othercore.
Also, by having a dedicated processor available for a given task,code developed on asingle-core processor can be more easily ported to “half” of thedual-core processor.Both asymmetric and symmetric multiprocessors support this programmingmodel.
However, having identical cores available allows for the possibilityof re-allocating anyunused resources across functions and tasks. As we described earlier,the symmetricprocessor also has the advantage of providing a common, integratedenvironment.
Another important consideration of this model relates to the factthat the size of the coderunning the operating system and control tasks is usually measured inMbytes. As such,the code must reside in external memory, with instruction cacheenabled.
While this can work, care must be taken to prevent cache line fillsfrom interfering with the overall timeline of the application. Arelatively small subset of code runs the most often, due to the natureof algorithm coding. Therefore, enabling instruction cache is usuallyadequate in this model.
Homogeneous programming model
Because there are two identical cores in a symmetric multiprocessor,traditionalprocessing-intensive applications can be split equally across eachcore. We call this a”Homogeneous Model”.
In this scheme, code running on each core is identical. Only thedata being processed is different. In a streaming multi-channel audioapplication, for example, this would mean that one core processes halfof the audio channels, and the other core processes the remaining half.
Extending this concept to video and imaging applications, each coremight process alternate frames, as shown in Figure 2, below. This usuallytranslates to a scenario where all code fits into internal memory, inwhich case instruction cache is probably not used.
|Figure2: Homogenous Model|
The communication flow between processors in this model is usuallypretty basic. AMailbox Interrupt (or a supplemental interrupt between cores) cansignal the other core tocheck for a semaphore, to process new data, or to send out processeddata.
Usually, an operating system or kernel is not required for thismodel; instead, a “superloop” is implemented. We use the term “super loop” to indicate a codesegment that justruns over and over again, of the form:
Master-slave programming model
In the “Master-Slave” usage model, both cores perform intensivecomputation in order toachieve better utilization of the symmetric processor architecture. Inthis arrangement,one core (the Master) controls the flow of the processing and actuallyperforms at leasthalf the processing load. Portions of specific algorithms are split andhandled by theSlave, assuming these portions can be parallelized.
A variety of techniques, among them interrupts and semaphores, canbe used tosynchronize the cores in this model. The Slave processor usually takesless processing re Atime than the Master does. Thus, the Slave can poll a semaphore inshared memory whenit is ready for more work.
This is not always a good idea, though, because if the Master coreis still accessing the bus to the shared memory space, a conflict willarise. A more robust solution is for the Slave to place itself in idlemode and wait for the Master to interrupt it with a request to performthe next block of work. A scheduler or simple kernel is most useful inthis model.
Pipelined programming model
A variation on the Master-Slave Model allocates processing steps toeach core. That is,one core is assigned one or more serial steps, and the other corehandles the remainingones.
This is analogous to a manufacturing pipeline where one core'soutput is the nextcore's input. Ideally, if the processing task separation is optimized,we will achieve aperformance advantage greater than that of the other models. The taskseparation,however, is heavily dependent on the processor architecture and memoryhierarchy. Forthis reason, the Pipelined Model isn't as portable across processors asthe otherprogramming models are.
The symmetric processor supports many more programming models thanthe asymmetricprocessor does, so you should carefully consider all of your optionsbefore starting aproject.
Although we've reviewed only a few of the most popular media processingframeworks,they serve to illustrate the types of data and processing flows thatare becomingcommonplace in multimedia environments. The models presented here canbe leveragedinto systems with more complex flows, thus enabling reduced developmenttime andquicker time to market.
Rick Gentile is and David Katz aresenior DSP applications engineers in the Blackfin Applications Group at AnalogDevices, Inc.
This article is excerpted from a paper of the same name presented at the Embedded Systems Conference Silicon Valley 2006. Used with permission of the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/sv.