Embedded DSP Software Design Using Multicore a System-on-a-Chip (SoC) Architecture: Part 2

Software development for SoCs involve partitioningthe application among the various processing elements based on the mostefficient computational model. This can require a lot of trial anderror to establish the proper partitioning. At a high level the SoCpartitioning algorithm is as follows:

Place the state machine software (those algorithmsthat provide application control, sequencing, user interface control,event driven software, and so on) on a RISCprocessor such as an ARM.

Place the signal processing software on the DSP, taking advantage ofthe application specific architecture that a DSP offers for signalprocessing functions.

Place high rate, computationally intensivealgorithms in hardware accelerators, if they exist and if they arecustomized to the specific algorithm of consideration.

As an example, consider the software partitioningshown in Figure 11.7 below. ThisSoC model contains a general-purpose processor (GPP), a DSP, andhardware acceleration. The GPP contains a chip support library which isa set of low level peripheral APIs that provide efficient access to thedevice peripherals, a general-purpose operating system, an algorithmicabstraction layer and a set of API for and application and userinterface layer.

The DSP contains a similar chip support library, aDSP centric kernel, a set of DSP specific algorithms and interfaces tohigher level application software. The hardware accelerator contains aset of APIs for the programmer to access and some very specificalgorithms mapped to the acceleration.

The application programmer is responsible for theoverall partitioning of the system and the mapping of the algorithms tothe respective processing elements. Some vendors may provide a “blackbox” solution to one or more of these processing elements, includingthe DSP and the hardware accelerators.

This provides another level of abstraction to theapplication developer who does not need to know the details of some ofthe underlying algorithms. Other system developers may want access tothese low level algorithms, so there is normally flexibility in theprogramming model for these systems, depending on the amount ofcustomization and tailoring required.

Figure11.7 Software Architecture for SoC (courtesy of Texas Instruments)

Communication in an SoC is primarily established bymeans of software. The communication interface between the DSP and theARM in Figure 11.7, for example, is realized by defining memorylocations in the DSP data space as registers.

The ARM gains read/write access to these registersthrough a host interface. Both processors can asynchronously issuecommands to each other, no one masters the other. The command sequenceis purely sequential; the ARM cannot issue a new command unless the DSPhas sent a “command complete” acknowledgement.

There exist two register pairs to establish thetwo-way asynchronous communication between ARM and DSP, one registerpair is for the sending commands to ARM, and the other register pair isfor the sending commands to DSP. Each register pair has:

a command register, which is used pass commands toARM or DSP;
a command complete register, which is used to return the status ofexecution of the command;
each command can pass up to 30 words of command parameters;
also, each command execution can return up to 30 words of commandreturnparameters.

An ARM to DSP command sequence is as follows:
ARM writes a command to the command register
ARM writes number of parameters to number register
ARM writes command parameters into the command parameter space
ARM issues a Nonmaskable interrupt to the DSP
DSP reads the command
DSP reads the command parameters
DSP executes the command
DSP clears the command register
DSP writes result parameters into the result parameter space
DSP writes “command complete” register
DSP issues HINT interrupt to ARM

The DSP to ARM command sequence is as follows:
DSP writes command to command register
DSP writes number of parameters to number register
DSP writes command parameters into the command parameter space
DSP issues an HINT interrupt to the DSP
ARM reads the command
ARM reads the command parameters
ARM executes DSP command
ARM clears the command register
ARM writes result parameters into the result parameter space
ARM writes “command complete” register
ARM sends an INT0 interrupt to the DSP

Communication between the ARM and the DSP is usuallyaccomplished using a set of communication APIs. Below is an example ofa set of communication APIs between a general-purpose processor (inthis case an ARM) and a DSP: 

Start DSP address for ARM-DSP.
End DSP address for ARM-DSP.
ARM to DSP, parameters and command from ARM.
ARM to DSP, return values and completion code from DSP.
DSP to ARM, parameters and command from DSP.
DSP to ARM, return values and completion code from ARM.
#define DSP_CMD_MASK (Uint16)0x0FFF
Command mask for DSP.
#define DSP_CMD_COMPLETE (Uint16)0x4000
ARM-DSP command complete, from DSP.
#define DSP_CMD_OK (Uint16)0x0000
ARM-DSP valid command.
#define DSP_CMD_INVALID_CMD (Uint16)0x1000
ARM-DSP invalid command.
#define DSP_CMD_INVALID_PARAM (Uint16)0x2000
ARM-DSP invalid parameters.

STATUS ARMDSP_sendDspCmd (Uint16 cmd, Uint16 *cmdParams, Uint16 nParams)
Send command, parameters from ARM to DSP.
STATUS ARMDSP_getDspReply (Uint16 *status, Uint16 *retParams, Uint16 nParams)
Get command execution status, return parameters sent by DSP to ARM.
STATUS ARMDSP_getArmCmd (Uint16 *cmd, Uint16 *cmdParams, Uint16 nParams)
Get command, parameters sent by DSP to ARM.
STATUS ARMDSP_sendArmReply (Uint16 status, Uint16 *retParams, Uint16 nParams)
end command execution status, return parameters from ARM to DSP.
Clear ARM-DSP communication area.

SoC System Boot Sequence
Normally, the boot image for DSP is part of the ARM boot image. Therecould be many different boot images for the DSP for the different tasksDSPneeds to execute. The sequence starts with the ARM downloading theimage related to the specific task to be executed by the DSP. ARMresets then the DSP (via a control register) and then brings the DSPout of reset.

At this stage the DSP begins execution at apre-defined location, usually in ROM. The ROM code at this addressinitializes the DSP internal registers and places the DSP into an idlemode. At this point ARM downloads the DSP code by using a host portinterface.

After it completes downloading the DSP image, the ARMcan send an interrupt to the DSP, which wakes it up from the idle mode,vectors to a start location and begins running the application codeloaded by the ARM. The DSP boot sequence is given below:

ARM resets DSP and then brings it out of reset.
DSP gets out of reset and load its program counter (PC) register witha startaddress.
The ROM code in this location branches the DSP to an initializationroutineaddress.
A DSP status register is initialized to move the vector table to adedicated location,all the interrupts are disabled except for a dedicated unmaskableinterrupt and theDSP is set to an mode.
While DSP is in its mode, the ARM loads the DSP Program/Data memorywiththe DSP code/data.
When the ARM finishes downloading the DSP code, it wakes up DSP fromthemode by asserting an interrupt signal.
The DSP then branches to a start address where the new interruptvector table islocated. The ARM should have loaded this location with at least abranch to thestart code.

Tools Support for SoC
SoC, and heterogeneous processors in general, require moresophisticated tools support. A SoC may contain several programmabledebuggable processing elements that require tools support for codegeneration, debug access and visibility, and real-time data analysis. Ageneral model for this is shown in Figure11.8 below. A SoC processor will have several processingelements such as an ARM and DSP.

Each of these processing elements will require adevelopment environment that includes mechanisms to extract, process,and display debug and instrumentation streams of data, mechanisms topeak and poke at memory and control execution of the programmableelement, and toolsto generate, link, and build executable images for the programmableelements.

Figure11.8 An SoC tools environment (courtesy of Texas Instruments)

SoC tools environments also contain support formonitoring the detailed status of each of the processing elements. Asshown in Figure 11.9 below ,detailed status reporting and control of the processing elements in anSoc allows the developer to gain visibility into the execution profileof the system.

Also, since power sensitive SoC devices may powerdown some or all of the device as the application executes, it isuseful to also understand the power profile of the application. Thiscan also be obtained using the proper analysis tools.

Figure11.9 Tools support provide visibility into the status of each of theSoC processing elements (courtesy of Texas Instruments)

A VideoProcessing Example of SoC
Video processing is a good example of a commercial applicationrequiring a system on a chip solution. Video processing applicationsare computationally intensive and demand a lot of MIPS to maintain thedata throughput required for these applications. Some of the verycompute-intensive algorithms in these applications include:

” Image pipe processing and video stabilization
Compression and Decompression
” Color conversion
” Watermarking and various forms of encryption

Figure11.10 A SoC designed for video and image processing using a RISC device(ARM926) and a DSP (courtesy of Texas Instruments)

To perform a 30 frame per second MPEG-4 algorithm cantake as much as 2500 MIPS depending on the resolution of the video.

The Audio channel processing is not as demanding butstill requires enough overall MIPS to perform audio compression anddecompression, equalization and sample rate conversion.

As these applications become even more complex anddemanding (for example new compression technologies are still beinginvented), these SoC will need to support not just one but severaldifferent compression standards. SoCs for video applications includededicated instruction set accelerators to improve performance. The SoCprogramming model and peripheral mix allows for the flexibility tosupport several formats of these standards efficiently.

For example the DM320 SoC processor in Figure 11.10 above has an on chipSIMD engine (called iMX) dedicated to video processing. This hardwareaccelerator can perform the common video processing algorithms(Discrete Cosine Transform (DCT), IDCT, Motion Estimation, MotionCorrelation to name a few)

The VLCD (variable length coding/decoding) processoris built to support variable length encoding & decoding as well asquantization of standards such as JPEG, H.263, MPEG-1/2/4 videocompression standards.

As you can see from Figure 11.10, an SoC solutioncontains appropriate acceleration mechanisms, specialized instructionsets, hardware co-processors, etc., that provide efficient execution ofthe important algorithms in DSP applications. We discussed an exampleof video processing but you will find the same mechanisms supportingother applications such as wireless basestations and cellular handsets.

To read Part 1, go to “The hardware architecture of an embeddedmedia SoC.”

Robert Oshana isan engineering manager in the Software Development Organization of Texas Instruments DSP Systems business.He is responsible for the development of hardware and software debugtechnology for many of TI's programmable devices. He has 25 years ofreal-time embedded development experience.

Used with the permission of thepublisher, Newnes/Elsevier this series of two articles is based onmaterial from DSPSoftware Development Techniques for Embedded and Real Time Systems, by Robert Oshana.

1. Multiprocessorsystems-on-chips, by Ahmed Jerraya, Hannu Tenhunenand Wayne Wolf, page 36, IEEE Computer , July 2005.
2. EmbeddedSoftware in Real-Time Signal Processing Systems: DesignTechnologies, Proceedings of the IEEE , vol. 85, no. 3,March1997.
3. A Software/Hardware Co-design Methodology for EmbeddedMicroprocessor Core Design, IEEE 1999.
4. Component-BasedDesign Approach for Multicore SoCs,Copyright 2002, ACM.
5. ACustomizable Embedded SoC Platform Architecture, IEEE IWSOC'04 <- International Workshop on System-on-Chip for Real-TimeApplications.
6. Howvirtual prototypes aid SoC hardware design, Hellestrand, Graham.EEdesign.com May 2004.
7. PanelWeighs Hardware, Software Design Options, Edwards,Chris. EETUK.com Jun 2000.
8. Backto the Basics: Programmable SoCs. Zeidman, Bob.Embedded.com July 2005.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.