This makes a GPOS ideal for addressing general purpose system
components such as networking, user interface and display management.
However, this abstraction sacrifices the fine-grained control of system resources required to meet the performance goals of computationally intensive algorithms such as signal processing code. For this level of control, developers typically turn to a real-time operating system (RTOS).
From an embedded signal processing stand point, there are essentially two types of OSes to consider: Linux, a general-purpose operating system, and DSP/BIOS, a real-time operating system. Linux offers a higher level of abstraction while the DSP/BIOS provides finer control.
In order to leverage the strengths of both alternatives, developers
can use a system virtual
machine, which allow that allow programmers to run Linux and
DSP/BIOS concurrently on the same DSP processor.
(Editor's note: Unlike process virtual machine environments specific to particular programming languages, such as the Java VM, system virtual machines correspond to actual hardware and can execute complete operating systems in isolation from other similar instantiations in the same computing environment.)
An important question to ask however, is why not simply use a CPU+DSP combo running Linux and DSP/BIOS separately? CPUs are, after all, more efficient at running control code for user interfaces, etc. And separate cores avoid the overhead associated with virtualization. However, putting all functionality onto one chip is attractive for several reasons.
For one, today's high performance DSPs are much more powerful than
previous generation DSPs. This frees up more cycles for control
processing. In addition, most high-performance DSPs are more
general-purpose than they used to be, allowing for more efficient
control code processing.
If all functionality can fit on a DSP, the benefits are compelling. One less chip translates to lower cost and area, as well as lower energy consumption because power hungry interprocessor data transfers are eliminated.
Scheduling
One of the most beneficial and commonly used aspects of any operating
system is the ability to concurrently execute multiple tasks or
threads. The operating system employs a scheduler to manage the
processing core in order to serially order tasks for execution.
A historical concern of embedded programmers when using Linux was the lack of real-time performance. However, recent improvements to the Linux kernel have greatly improved its responsiveness to system events, making it suitable for a broad class of enterprise, consumer and embedded products.
Linux provides both time slicing and priority-based scheduling of threads. The time slicing methodology shares processing cycles between all threads so that none are locked out. This is often useful for user interface functions to guarantee that if the system becomes overloaded, responsiveness may slow, but no user functions are completely lost.
Priority-based thread scheduling, on the other hand, guarantees that the highest priority ready thread in the system executes until it relinquishes control, at which time the next highest priority ready thread begins executing.
The Linux kernel re-evaluates the priorities of ready threads upon each transition from kernel to user mode. This means that any new kernel-evaluated event, such as data becoming ready on a driver, can trigger an immediate transition into a new thread (within the latency response of the scheduler). Due to the determinism of priority-based threads, they are often useful for signal processing applications where real-time requirements must be met.
Prior to version 2.6 of the Linux kernel, the main detraction to
real-time performance was the fact that the Linux kernel would disable
interrupts, in some cases for hundreds of milliseconds.
This allows for more efficient implementation of the kernel because sections of code do not need to be made reentrant when interrupts are disabled but adds latency to interrupt response.
Now with version 2.6, a build option is available that inserts much more frequent re-enabling of interrupts throughout the kernel code. This feature is often referred to in the Linux community as the preempt kernel, and while it does degrade performance of the kernel slightly, it greatly improves real-time performance. For many system tasks, when the preemptive Linux 2.6 kernel is used with real-time threads, it will provide sufficient performance to meet real-time needs.
For instance, the Texas Instruments DSP/BIOS supports only priority-based scheduling, in the form of Software Interrupts and Tasks. As with the Linux scheduler, these Software Interrupts and Tasks are preemptive. However, DSP/BIOS also provides application programmers with direct access to hardware interrupts, a resource that is only available in kernel mode in Linux.
Direct access to hardware interrupts allows application programmers to achieve the theoretical minimum latency response supported by the underlying hardware. For applications such as control loops where the absolute minimum latency is required, this fine grained control over hardware interrupts is frequently a valuable feature.
Protected Access to Resources
A fundamental property of Linux and most general-purpose operating
systems is the separation of user-space programs from the underlying
system resources that is utilized. Direct access to memory and device
peripherals is permitted only when operating in supervisor (i.e.
kernel) mode.
When a user program desires access to system resources, it must request them from the kernel through kernel modules called drivers. The application exists in a user memory space and will accesses the driver through virtual files. The virtual files then translate the application's requests into the kernel memory space in which the driver executes.
Linux provides an extremely feature-rich driver model that encompasses standard streaming peripherals, block storage devices and file systems, and even networking and network-based file systems.
The separation of these drivers from the user-space application provides robustness. Furthermore, the abstraction to a common driver interface makes it easy to stream data to a serial port, to a flash file system or to a network shared folder " all with little change to the underlying application code.
This flexibility, however, comes at a price. The strict separation between applications and physical resources adds some degree of overhead. When a user space program accesses a device peripheral, a context switch must be made into kernel mode in order to process the request.
Typically this is not a significant limitation because the data is accessed in blocks as opposed to sample-by-sample, so the context switch into kernel mode needs to be made only once per block access.
There are cases, however, when application code requires a tight coupling with physical hardware. This situation occurs frequently when using high-performance processors such as DSPs where data throughput is a key element to processing without stalls. In these cases, the separation of physical resources in kernel space from the application in user space may be a significant detriment to the system.
Coupling of Application and Hardware
Let us examine a typical situation encountered when performing block
video processing using the TMS320DM643x processor architecture, which
incorporate a 600 MHz / 4800 MIPS DSP processing core and a wide range
of multimedia peripherals, including a feature-rich video port sub
system. A typical application of this hardware would be the compression
of an incoming video stream using H.264.
In order to take full advantage of the processing capability of the DSP core, processed data should be accessed from single-cycle internal memory as opposed to slower external memory. Although it would be technically possible to enable the processors with enough fast on-chip memory to store one or more full video frames, this approach would be cost prohibitive to most target markets. Instead, the processor provides 80 Kbytes of single-cycle on-chip data memory .
While small relative to a full frame, 80 Kbytes has been determined
by TI through simulation to give the optimal area/performance tradeoff
for H.264 and other video processing algorithms.
To keep this memory fed with data, the DSP uses a Direct Memory Access (DMA) controller, which can also be utilized to efficiently transfer sub-blocks of data between external and internal memory without using cycles from the processing core (Figure 1 below).
![]() |
| Figure 1. DSP Processor utilizes DMA hardware to transfer small sub-blocks of a video frame in external memory into internal memory to be processed by the DSP core. |
From a whole-system perspective, this method can provide nearly the same performance as a chip with an entire video buffer but at a fraction of the cost. To achieve this performance, however, requires a very tight coupling between the application, the operating system and the underlying memory and DMA hardware.
First, the application must have a means of distinguishing between fast internal memory and bulk external memory. Second, the application must be able to execute many small, precisely-timed DMA operations. Since all latency incurred when accessing the DMA is magnified by hundreds or possibly thousands of DMA accesses per video frame, efficient performance of these DMA operations within the Linux driver model is difficult, if not impossible, to achieve.
Practical implementations of this method have been demonstrated
utilizing DSP/BIOS, providing native APIs to allow applications to
request internal versus external memory. This also allows applications
to access DMA registers directly with no context switching penalty.
The Best of Both Worlds
Although many multimedia applications spend the majority of their
processor cycles on signal processing, there are many higher-level
functions that a consumer-ready product must implement. User interface
and display functions, networking and file manipulation are just a few.
Because these features are not time critical, the fine-level control of DSP/BIOS is not required. Here, the resource abstraction provided by the Linux driver model is preferred for the benefits of greater flexibility and reduced development time, not to mention the wealth of open-source application code available in the Linux community.
A solution in which the Linux and DSP BIOS operating systems run concurrently on the same device involves the use of a virtualizer to provide the system developer or integrator with advantages of both systems (Figure 2 below).
![]() |
| Figure 2. Linux and DSP/BIOS running concurrently on a DM643x DSP Device |
The virtualizer acts as a fast and predictable switch to share DSP
resources between Linux and DSP/BIOS operating systems. It guarantees
the best possible performance for DSP/BIOS threads by making a
speculative switch to the context of the DSP/BIOS operating system
whenever an interrupt is received.
If the newly arrived interrupt corresponds to an event recognized within the DSP/BIOS context, it will be handled within the DSP/BIOS context, which is already loaded and ready to run.
While the virtualizer is DSP/BIOS enabled, the application is given direct access to needed system resources without affecting the user and kernel spaces maintained within the (suspended) Linux environment.
Once the application has completed its high performance signal processing calculations within the DSP/BIOS environment, the virtualizer forces a transition back to the Linux environment, which provides access to the higher-level features available there.
The virtualizer-mediated sub-10 microsecond switch time between operating systems allows programmers to meet real-time performance requirements with little penalty compared to a native DSP/BIOS-only system. This solution incurs a penalty of only about 1.5 percent processing overhead for a typical multimedia device.
Additional Advantages to the
Dual-OS System
Perhaps the simplest advantage to extending a Linux-based product to
include the DSP/BIOS operating system is the ability to use algorithms
from the hundreds of associated third parties with no porting effort.
Compliance to the xDAIS standard guarantees seamless integration of
these third party algorithms into a DSP/BIOS environment.
Another advantage of extending a Linux-based system to include DSP/BIOS is that applications executing in the DSP/BIOS environment are free from the constraints of the GNU General Public License (GPL) of the Linux kernel.
When implementing a Linux-based solution, it is not always obvious exactly what the licensing requirements of unique, developer produced software intellectual property are. By executing IP within the context of the DSP/BIOS OS instead of the Linux OS, it is possible to avoid this legal concern.
Conclusion
Using the technique described in this article, Linux and DSP BIOS may
be run concurrently on a single DSP core. This provides all the
functionality of a Linux solution while providing the precision and
hardware control available under DSP/BIOS.
Programmers may take advantage of application code written for Linux and signal processing code written for DSP/BIOS without the effort of having to port one into the other environment.
For a designer who requires the features of Linux in a real-time, embedded application, upgrading to include the DSP BIOS toolset through the use of a virtualizer adds significantly improved signal-processing performance at a small cost in terms of system resources.
Dave Beal is director of product management for VirtualLogix, Inc., Steve Preissig is an instructor in Texas Instruments' Technical Training Organization, and Aurelien Jacquiot is Project Manager at VirtualLogix, France.