Back to the basics: Picking the right RTOS for a hybrid RISC/DSP core -

Back to the basics: Picking the right RTOS for a hybrid RISC/DSP core


It seems that everywhere you look today in the embedded space, there’s a new digital signal processing application. Modern video, audio, game, imaging, and communications devices use underlying algorithms that require a substantial share of processing cycles from traditional RISC processors.

Dedicated Digital Signal Processors (DSPs) have existed since the 70s to excel at the types of computing necessary to filter, compress, convolve, and otherwise process these applications’ underlying data streams.

However, at the same time these products require a large amount of control logic to handle peripheral communication, adhere to complex protocols, and present sophisticated user interfaces – tasks which are best performed by conventional RISC processor designs. Devices that needed signal processing and control components traditionally were designed with multiple individual chips connected by dedicated communication buses.

By combining individual RISC and DSP processor architectures (cores) into a single IC, processor developers were able to offer a solution to this problem that reduced power consumption due to increased proximity of processing units, and reduced overall part cost.

Recently processor designers have been reducing power consumption and part cost in a new way – by combining RISC and DSP features into a single core, known as a ‘convergent’ processor core. Examples of architectures that were designed from the beginning to be convergent are the Analog Devices Blackfin processor. Convergent processors based on RISC modified to efficiently handle DSP functions include the StarCore processors (SC1000, SC2000, and SC v5).

Some other examples of convergent architectures are based on well established RISC architectures and have been modified to efficiently fulfill DSP functions. These include the MIPS 24KE, Renesas SH3-DSP, PowerPC with AltiVec, and ARM966E-S. Figure 1, below, shows an example of how a RISC processor and a DSP processor architecture can be combined into a convergent processor architecture.

Figure 1. Comparison of RISC, DSP, and hybrid architectures

Compared to standalone RISC systems, convergent processors can be much more efficient at performing DSP tasks. One processor developer cites that its convergent architecture is as much as 200% better at signal processing tasks than a traditional RISC architecture, while having a similar unit cost.

This increase in processing efficiency allows system designers to run their systems at lower clock speeds, which in turn reduces the system power consumption. Yet compared to traditional DSP processor designs, convergent processors generally have more complex pipelines and can run at high enough speeds to allow for fast control-intensive computing.

Compared to multi-core processors that combine RISC/DSP cores, convergent processor systems can help system designers avoid complications associated with programming two separate processor architectures.

When two different processor variants are used, often one processor becomes essentially a slave to the other, having no direct operating system interactions. For instance, with a dual-core RISC/DSP architecture, the RISC architecture often becomes the master processor, running an operating system, while the DSP runs data runs data-centric operations with a simpler scheduling policy.

In this case, often the developers need to learn a second API and programming model, in addition to the second instruction set architecture. While developers often need to learn different operating systems and architectures, it does increase the complexity of the underlying system which can lead to additional debugging and increased time to market.

Multi-core programming challenges still may be present in systems built around convergent cores. Even though convergent architectures combine the functionality of two cores into a single core, many system designers still choose to use a multi-core architecture to attain a higher level of performance.

Multiple convergent cores might be combined into a single processor package, as is done with the Blackfin ADSP-BF561 processor and the StarCore MSC8122. Convergent architectures might even be combined with a traditional RISC core in the same IC, or on the same board.

Selecting the right RTOS
Once you decide that a particular convergent DSP processor is appropriate for your design, you will probably be faced with a number of Real Time Operating Systems (“RTOS”) choices.

However, independent of your application domain, there is a good chance you will be faced with some, if not all, of the following considerations. You will have DSP functionality that will need to run at strict real-time schedules or with real-time response.

Usually control logic functionalities of the system have less real-time constraints, but the signal processing functionality will need to respond to external stimuli within less than a microsecond. You will probably want to have a large amount of configurability of your hardware in order to meet these time constraints.

You will likely have control logic with complex interdependencies on DSP code. You will be building your application while your hardware is still in development.

Your development effort may be under great time pressure, requiring integration with existing middleware (networking protocols, stacks, codecs, or filesystems). And finally, you anticipate using future improved revisions of your current architecture or perhaps even new architectures in future versions of your product or in other products that contain similar functionality.

Meeting DSP real time constraints
You have to meet hard real-time constraints with DSP-oriented tasks. Your development environment should be able to visually analyze the system while it is running to help you understand where it is spending its precious CPU time.

For instance, an operating system event log could be visually displayed to allow you to realize what sequence of activities occurred immediately before a real-time deadline was missed. If your chip has trace capabilities, a sophisticated debugging tool that integrates with your operating system could provide the capability to recreate the state of processor and operating system at any point during the trace capture.

The RTOS should also give you a wide spectrum of choices on how to schedule your code to help you specifically choose trade-offs between memory protection, ability to use OS services, preemptability, and speed. With an RTOS matched to the right processor, the system designer has a number of options of how to organize their application. Control code that needs to be as secure as possible can be executed within its own partitioned, protected address space.

Configuring the memory subsystem for this partitioning naturally requires extra time during context switches, so this choice provides for the most reliability at some cost of execution speed. Code that is run as response to some external stimuli, which needs to use kernel resources to respond, can be separated into a minimal interrupt handler component and a kernel handler thread. A kernel thread incurs a minimal context switch penalty.

To improve response even further, the system designer can use a ‘software interrupt’ mechanism. Software interrupts provide a way to run a C function (or even assembly code) as a response to timers or external stimuli. They do not require their own stack and can be configured to run until completion, except when interrupted by hardware interrupts.

This provides the system designer with a way to have code run still with the ability to use OS services, but with very quick response and completion time. Of course no RTOS can claim to have low interrupt latency if it disables interrupts within the kernel itself. Some popular operating systems disable interrupts during system calls, for instance, which means that interrupt latency is at best hard to compute, and at worst unbounded.

Configuring your RTOS correctly
As mentioned before, configuring your processor correctly can make the difference between meeting and missing a real-time deadline. If you have the source to your RTOS kernel, you can change whatever you like; but even with kernel source, it is better for maintainability if you can reconfigure the OS without needing to change source code.

Your development system should provide interfaces to make it easy to indicate which text and data to place in internal memory. The kernel could have well defined hooks into configurable board-specific code that allows you to do configure caches (locking in certain caches lines) and memory traffic controlling (if the processor’s bus interface allows prioritizing different types of bus accesses).

With configurable caches, you will want to be able to place your application and its data at explicit physical addresses, and this should be possible using the development tools, even down to the point where you choose that a given array in your C code should be placed at a given address.

Because you have a complicated system with multiple tasks running on your convergent core, you will want a powerful way to debug the task interactions. This is certainly very important on a processor that combines both control and DSP tasks into one system.

If you are diagnosing a problem involving an interrupt handler, you may want to have an RTOS development environment that can stop the processor at the point of the failure, which may be in the interrupt handler or kernel tasks and inspect the entire system state (including the address spaces, tasks, and synchronization objects).

This can be achieved by using a JTAG debugger which can halt the processor and inspect its physical memory, system state, and even peripherals state. For this to be possible, the debugger will need to be intimately familiar with the RTOS in order to understand the kernel’s internal bookkeeping of address spaces, tasks contexts, and semaphores.

Alternately, it may not be practical to halt the processor from a device standpoint (lack of interrupt response may cause a serious device malfunction) or from a capabilities standpoint (there is no JTAG solution for the processor).

Or, you may want to focus in on just one part of an otherwise live, running system. The term ‘run-mode debugging’ refers to the ability to control, inspect, and modify tasks’ state in a running system. This can be useful for debugging the complicated logic within control tasks in a convergent system while allowing the DSP handling tasks to run at full speed.

Ken Mixter is engineering manager at Engineering Manager at Green Hills Software, Inc.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.