CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Implementing dual OS signal processing using Linux and the DSP/BIOS RTOS



Embedded.com

Protected Access to Resources
A fundamental property of Linux and most general-purpose operating systems is the separation of user-space programs from the underlying system resources that is utilized. Direct access to memory and device peripherals is permitted only when operating in supervisor (i.e. kernel) mode.

When a user program desires access to system resources, it must request them from the kernel through kernel modules called drivers. The application exists in a user memory space and will accesses the driver through virtual files. The virtual files then translate the application's requests into the kernel memory space in which the driver executes.

Linux provides an extremely feature-rich driver model that encompasses standard streaming peripherals, block storage devices and file systems, and even networking and network-based file systems.

The separation of these drivers from the user-space application provides robustness. Furthermore, the abstraction to a common driver interface makes it easy to stream data to a serial port, to a flash file system or to a network shared folder " all with little change to the underlying application code.

This flexibility, however, comes at a price. The strict separation between applications and physical resources adds some degree of overhead. When a user space program accesses a device peripheral, a context switch must be made into kernel mode in order to process the request.

Typically this is not a significant limitation because the data is accessed in blocks as opposed to sample-by-sample, so the context switch into kernel mode needs to be made only once per block access.

There are cases, however, when application code requires a tight coupling with physical hardware. This situation occurs frequently when using high-performance processors such as DSPs where data throughput is a key element to processing without stalls. In these cases, the separation of physical resources in kernel space from the application in user space may be a significant detriment to the system.

Coupling of Application and Hardware
Let us examine a typical situation encountered when performing block video processing using the TMS320DM643x processor architecture, which incorporate a 600 MHz / 4800 MIPS DSP processing core and a wide range of multimedia peripherals, including a feature-rich video port sub system. A typical application of this hardware would be the compression of an incoming video stream using H.264.

In order to take full advantage of the processing capability of the DSP core, processed data should be accessed from single-cycle internal memory as opposed to slower external memory. Although it would be technically possible to enable the processors with enough fast on-chip memory to store one or more full video frames, this approach would be cost prohibitive to most target markets. Instead, the processor provides 80 Kbytes of single-cycle on-chip data memory .

While small relative to a full frame, 80 Kbytes has been determined by TI through simulation to give the optimal area/performance tradeoff for H.264 and other video processing algorithms.

To keep this memory fed with data, the DSP uses a Direct Memory Access (DMA) controller, which can also be utilized to efficiently transfer sub-blocks of data between external and internal memory without using cycles from the processing core (Figure 1 below).

Figure 1. DSP Processor utilizes DMA hardware to transfer small sub-blocks of a video frame in external memory into internal memory to be processed by the DSP core.

From a whole-system perspective, this method can provide nearly the same performance as a chip with an entire video buffer but at a fraction of the cost. To achieve this performance, however, requires a very tight coupling between the application, the operating system and the underlying memory and DMA hardware.

First, the application must have a means of distinguishing between fast internal memory and bulk external memory. Second, the application must be able to execute many small, precisely-timed DMA operations. Since all latency incurred when accessing the DMA is magnified by hundreds or possibly thousands of DMA accesses per video frame, efficient performance of these DMA operations within the Linux driver model is difficult, if not impossible, to achieve.

Practical implementations of this method have been demonstrated utilizing DSP/BIOS, providing native APIs to allow applications to request internal versus external memory. This also allows applications to access DMA registers directly with no context switching penalty.

1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :