The classical trade-off between system performance and ease ofprogramming is one of the primary differentiators between generalpurpose and real-time operating systems.
GPOSes tend to provide ahigher degree of resource abstraction. This improves applicationportability, ease of development and increases system robustnessthrough software modularity and isolation of resources.
This makes a GPOS ideal for addressing general purpose systemcomponents such as networking, user interface and display management.
However, this abstraction sacrifices the fine-grained control ofsystemresources required to meet the performance goals of computationallyintensive algorithms such as signal processing code. For this level ofcontrol, developers typically turn to a real-time operating system(RTOS).
From an embedded signal processing stand point, there areessentially two types of OSes to consider:
In order to leverage the strengths of both alternatives, developerscan use a system virtualmachine, which allow that allow programmers to run Linux andDSP/BIOS concurrently on the same DSP processor.
(Editor'snote: Unlike process virtual machine environments specific toparticular programming languages, such as the Java VM, system virtualmachines correspond to actual hardware and can execute completeoperating systems in isolation from other similar instantiations in thesame computing environment. )
An important question to ask however, is why not simply use aCPU+DSP combo running Linux and DSP/BIOS separately? CPUs are, afterall, more efficient at running control code for user interfaces, etc.And separate cores avoid the overhead associated with virtualization.However, putting all functionality onto one chip is attractive forseveral reasons.
For one, today's high performance DSPs are much more powerful thanprevious generation DSPs. This frees up more cycles for controlprocessing. In addition, most high-performance DSPs are moregeneral-purpose than they used to be, allowing for more efficientcontrol code processing.
If all functionality can fit on a DSP, the benefits are compelling.One less chip translates to lower cost and area, as well as lowerenergy consumption because power hungry interprocessor data transfersare eliminated.
One of the most beneficial and commonly used aspects of any operatingsystem is the ability to concurrently execute multiple tasks orthreads. The operating system employs a scheduler to manage theprocessing core in order to serially order tasks for execution.
A historical concern of embedded programmers when using Linux wasthe lack of real-time performance. However, recent improvements to theLinux kernel have greatly improved its responsiveness to system events,making it suitable for a broad class of enterprise, consumer andembedded products.
Linux provides both time slicing and priority-based scheduling ofthreads. The time slicing methodology shares processing cycles betweenall threads so that none are locked out. This is often useful for userinterface functions to guarantee that if the system becomes overloaded,responsiveness may slow, but no user functions are completely lost.
Priority-based thread scheduling, on the other hand, guarantees thatthe highest priority ready thread in the system executes until itrelinquishes control, at which time the next highest priority readythread begins executing.
The Linux kernel re-evaluates the priorities of ready threads uponeach transition from kernel to user mode. This means that any newkernel-evaluated event, such as data becoming ready on a driver, cantrigger an immediate transition into a new thread (within the latencyresponse of the scheduler). Due to the determinism of priority-basedthreads, they are often useful for signal processing applications wherereal-time requirements must be met.
Prior to version 2.6 of the Linux kernel, the main detraction toreal-time performance was the fact that the Linux kernel would disableinterrupts, in some cases for hundreds of milliseconds.
This allows for more efficient implementation of the kernel becausesections of code do not need to be made reentrant when interrupts aredisabled but adds latency to interrupt response.
Now with version 2.6, a build option is available that inserts muchmore frequent re-enabling of interrupts throughout the kernel code.This feature is often referred to in the Linux community as the preemptkernel, and while it does degrade performance of the kernel slightly,it greatly improves real-time performance. For many system tasks, whenthe preemptive Linux 2.6 kernel is used with real-time threads, it willprovide sufficient performance to meet real-time needs.
For instance, the Texas Instruments DSP/BIOSsupports only priority-based scheduling, in the form of SoftwareInterrupts and Tasks. As with the Linux scheduler, these SoftwareInterrupts and Tasks are preemptive. However, DSP/BIOS also providesapplication programmers with direct access to hardware interrupts, aresource that is only available in kernel mode in Linux.
Direct access to hardware interrupts allows application programmersto achieve the theoretical minimum latency response supported by theunderlying hardware. For applications such as control loops where theabsolute minimum latency is required, this fine grained control overhardware interrupts is frequently a valuable feature.
Protected Access to Resources
A fundamental property of Linux and most general-purpose operatingsystems is the separation of user-space programs from the underlyingsystem resources that is utilized. Direct access to memory and deviceperipherals is permitted only when operating in supervisor (i.e.kernel) mode.
When a user program desires access to system resources, it mustrequest them from the kernel through kernel modules called drivers. Theapplication exists in a user memory space and will accesses the driverthrough virtual files. The virtual files then translate theapplication's requests into the kernel memory space in which the driverexecutes.
Linux provides an extremely feature-rich driver model thatencompasses standard streaming peripherals, block storage devices andfile systems, and even networking and network-based file systems.
The separation of these drivers from the user-space applicationprovides robustness. Furthermore, the abstraction to a common driverinterface makes it easy to stream data to a serial port, to a flashfile system or to a network shared folder ” all with little change tothe underlying application code.
This flexibility, however, comes at a price. The strict separationbetween applications and physical resources adds some degree ofoverhead. When a user space program accesses a device peripheral, acontext switch must be made into kernel mode in order to process therequest.
Typically this is not a significant limitation because the data isaccessed in blocks as opposed to sample-by-sample, so the contextswitch into kernel mode needs to be made only once per block access.
There are cases, however, when application code requires a tightcoupling with physical hardware. This situation occurs frequently whenusing high-performance processors such as DSPs where data throughput isa key element to processing without stalls. In these cases, theseparation of physical resources in kernel space from the applicationin user space may be a significant detriment to the system.
Coupling of Application and Hardware
Let us examine a typical situation encountered when performing blockvideo processing using the TMS320DM643x processor architecture, whichincorporate a 600 MHz / 4800 MIPS DSP processing core and a wide rangeof multimedia peripherals, including a feature-rich video port subsystem. A typical application of this hardware would be the compressionof an incoming video stream using H.264.
In order to take full advantage of the processing capability of theDSP core, processed data should be accessed from single-cycle internalmemory as opposed to slower external memory. Although it would betechnically possible to enable the processors with enough fast on-chipmemory to store one or more full video frames, this approach would becost prohibitive to most target markets. Instead, the processorprovides 80 Kbytes of single-cycle on-chip data memory .
While small relative to a full frame, 80 Kbytes has been determinedby TI through simulation to give the optimal area/performance tradeofffor H.264 and other video processing algorithms.
To keep this memoryfed with data, the DSP uses a Direct Memory Access (DMA) controller,which can also be utilized to efficiently transfer sub-blocks of databetween external and internal memory without using cycles from theprocessing core (Figure 1 below ).
|Figure1. DSP Processor utilizes DMA hardware to transfer small sub-blocks ofa video frame in external memory into internal memory to be processedby the DSP core.|
From a whole-system perspective, this method can provide nearly thesame performance as a chip with an entire video buffer but at afraction of the cost. To achieve this performance, however, requires avery tight coupling between the application, the operating system andthe underlying memory and DMA hardware.
First, the application must have a means of distinguishing betweenfast internal memory and bulk external memory. Second, the applicationmust be able to execute many small, precisely-timed DMA operations.Since all latency incurred when accessing the DMA is magnified byhundreds or possibly thousands of DMA accesses per video frame,efficient performance of these DMA operations within the Linux drivermodel is difficult, if not impossible, to achieve.
Practical implementations of this method have been demonstratedutilizing DSP/BIOS, providing native APIs to allow applications torequest internal versus external memory. This also allows applicationsto access DMA registers directly with no context switching penalty.
The Best of Both Worlds
Although many multimedia applications spend the majority of theirprocessor cycles on signal processing, there are many higher-levelfunctions that a consumer-ready product must implement. User interfaceand display functions, networking and file manipulation are just a few.
Because these features are not time critical, the fine-level controlof DSP/BIOS is not required. Here, the resource abstraction provided bythe Linux driver model is preferred for the benefits of greaterflexibility and reduced development time, not to mention the wealth ofopen-source application code available in the Linux community.
A solution in which the Linux and DSP BIOS operating systems runconcurrently on the same device involves the use of a virtualizer to provide the system developer or integrator with advantages of bothsystems (Figure 2 below ).
|Figure2. Linux and DSP/BIOS running concurrently on a DM643x DSP Device|
The virtualizer acts as a fast and predictable switch to share DSPresources between Linux and DSP/BIOS operating systems. It guaranteesthe best possible performance for DSP/BIOS threads by making aspeculative switch to the context of the DSP/BIOS operating systemwhenever an interrupt is received.
If the newly arrived interrupt corresponds to an event recognizedwithin the DSP/BIOS context, it will be handled within the DSP/BIOScontext, which is already loaded and ready to run.
While the virtualizer is DSP/BIOS enabled, the application is givendirect access to needed system resources without affecting the user andkernel spaces maintained within the (suspended) Linux environment.
Once the application has completed its high performance signalprocessing calculations within the DSP/BIOS environment, thevirtualizer forces a transition back to the Linux environment, whichprovides access to the higher-level features available there.
The virtualizer-mediated sub-10 microsecond switch time betweenoperating systems allows programmers to meet real-time performancerequirements with little penalty compared to a native DSP/BIOS-onlysystem. This solution incurs a penalty of only about 1.5 percentprocessing overhead for a typical multimedia device.
Additional Advantages to theDual-OS System
Perhaps the simplest advantage to extending a Linux-based product toinclude the DSP/BIOS operating system is the ability to use algorithmsfrom the hundreds of associated third parties with no porting effort.Compliance to the xDAIS standard guarantees seamless integration ofthese third party algorithms into a DSP/BIOS environment.
Another advantage of extending a Linux-based system to includeDSP/BIOS is that applications executing in the DSP/BIOS environment arefree from the constraints of the GNUGeneral Public License (GPL) of the Linux kernel.
When implementing a Linux-based solution, it is not always obviousexactly what the licensing requirements of unique, developer producedsoftware intellectual property are. By executing IP within the contextof the DSP/BIOS OS instead of the Linux OS, it is possible to avoidthis legal concern.
Using the technique described in this article, Linux and DSP BIOS maybe run concurrently on a single DSP core. This provides all thefunctionality of a Linux solution while providing the precision andhardware control available under DSP/BIOS.
Programmers may take advantage of application code written for Linuxand signal processing code written for DSP/BIOS without the effort ofhaving to port one into the other environment.
For a designer who requires the features of Linux in a real-time,embedded application, upgrading to include the DSP BIOS toolset throughthe use of a virtualizer adds significantly improved signal-processingperformance at a small cost in terms of system resources.
Dave Beal is director of product management for VirtualLogix, Inc., StevePreissig is an instructor in TexasInstruments' Technical Training Organization, and Aurelien Jacquiotis Project Manager at VirtualLogix, France.