As embedded processing solutions gain in complexity and popularity, software engineers find the need to port multimedia algorithms from proof-of-concept PC-based systems with ample memory to embedded systems where resource management is essential to meet performance requirements.
Ideally, they want to achieve the highest possible performance without increasing the complexity of their “comfortable” programming model in terms of power consumption, memory allocation and performance.
What’s more, as applications blur the line between the realms of signal processing and control, into the realm of “convergent processing,” the software programming models from the two different worlds often collide. The challenges for the programmer dovetail with the challenges that silicon providers face – how can customers take advantage of features that enhance performance without overcomplicating their programming model?
Processor vendors take a multi-tiered approach to resolving this dilemma: adding hardware “hooks” on the silicon itself, providing a low-level software infrastructure that facilitates task scheduling and resource management, and offering a variety of operating systems that hide this complexity, to a great extent, from developers.
Adding hardware hooks
Today’s convergent processors supply plenty of architectural features that enable efficient memory and data flows, including direct memory access.
Direct Memory Access (DMA) is a crucial component of high-performance embedded systems, allowing data transfer to occur without involving the processor core. The total number of DMA channels available must support the wide range of peripherals and data movement options. Sometimes multiple DMA controllers will exist, to allow efficient accesses between peripherals and other system resources. Additional important DMA features include a facility to prioritize DMA channels to meet current task requirements, and the ability to configure the corresponding DMA interrupts to match these priority levels. Moreover, a “traffic control” mechanism can be very useful, allowing efficient utilization of DMA and memory buses to match an application’s processing timeline.
An Event Controller logs and services different types of system events. These include interrupts, which occur asynchronously to program flow, as well as exceptions, which occur synchronously with program flow. Exceptions can be generated, for instance, when access is attempted to protected memory. Interrupt service routines are often the most critical code blocks, and they must be treated as such, garnering a place in the fastest memory available. As we’ll discuss, programmable event prioritization is a key enabler to successful management of real-time systems.
Specialized instructions can accelerate specific processing routines, such as Viterbi decoding or video pixel processing. Although these are assembly-level instructions, ideally they can be accessed through intrinsics included with the processor’s C/C++ compiler.
Instruction and data caches on-chip allow the option to relieve the programmer from much of the burden of memory management, at the expense of reduced determinism. Still, hardware features like lockable cache lines and cache/SRAM partitioning provide some extra degree of control while preserving, to a large extent, a simplified programming model. For the highest degree of code and data memory control, DMA overlays from external memory can be employed, at the expense of higher complexity.
A Memory Management Unit (MMU) can take many forms, and it can be an invaluable tool for preserving and enforcing application boundaries. For instance, separate Supervisor and User modes can exclude access to processor resources like internal register space and interrupt service routines, in order to prevent malicious or accidental corruption when running an application. As another example, the properties of memory regions (cacheability control and memory access protection) can be explicitly defined.
Dynamic Power Management is a multi-tiered approach to processor power control that regulates transitions between different PLL modes (Full On, Sleep, Hibernate, etc.), core and system clock frequency changes, and core voltage adjustments. It provides this functionality to help the developer achieve a critical goal – to run the processor at the minimum power level attainable for a given point in an application.
Having looked at some of the architectural features available in convergent processors, we now turn to the software-based enablers of these features in the user’s end application — system services, operating systems, and kernels.
An application, regardless of its complexity, normally requires some level of “system services.” These allow an application to take advantage of some kernel-like features without actually using an OS or a kernel.
Figure 1 above shows a representative structure for system services. Of those shown, the memory, interrupt and dynamic power management services are typically initialization services that configure the device or change operating parameters. On the other hand, the DMA and callback services provide ways to help manage system flow.
The External Memory Management service comprises a set of routines to configure the external processor memory interfaces, including external DRAM, SRAM and Flash. It includes adjusting SDRAM refresh and access values for optimal operation. It also provides a mechanism to change timing and control parameters.
The Power Management service facilitates the control of core and system clock frequencies, as well as core voltage, in order to reduce processor power consumption to a level commensurate with the performance needs of the application. It also coordinates all mode transitions between a processor’s various power states.
The Interrupt Management service allows the processor to configure and service interrupts quickly. This includes the ability to assign interrupt levels based on the relative priority of events, as well as to specify where interrupt service routines are located.
When the processor returns from a high-priority interrupt, it can execute a “callback” function to perform the actual processing. The Callback Management service lets the programmer select how to respond to an event. Once the initial event is serviced, the Callback Manager sets up the processor to run in a lower-priority task. This is important, because otherwise there exists the risk of lingering in a higher-level interrupt, which can then delay response time for other events. By their nature, callbacks might be both lengthy and non-deterministic, since they exist chiefly to allow efficient handling of high-priority interrupts.
The DMA Management service simplifies a programming model by abstracting data transfers. DMA services allow data movement via a standard API, without having to configure every control register manually. The DMA Manager logs submitted work requests, and it handles them in a manner consistent with their priority and the order in which they are received by the application software. Overall, this provides two key advantages. The first is that the API provides an intuitive way to program the DMA controller. The second is that it simplifies the integration of DMA flows into a system that may have been prototyped on a PC with no DMA capability.
Each system service has a C-callable API that provides a software interface to the architectural features of the processor, as well as a common method of communicating with peripherals. Using APIs allows migration across a family of processors using the same application interface.
Not only can application software use the API to make calls into a system service, but some system services can also utilize the APIs of other system services. For example, the DMA Manager can invoke the Callback Manager once a data transfer is complete, in order to queue the service interrupt. As another example, the Power Management service might automatically call the External Memory Manager in order to adjust the SDRAM refresh rate, as part of its process of reducing core and system clock frequencies to conserve power.
As another form of system service, the Device Manager provides a straightforward software interface to various peripherals, such as video and audio ports, as well as associated devices like A/D and D/A converters. It does this by way of a collection of device drivers, which configure and control external devices, as well as receive and transmit data through these devices via a variety of data flow models.
Operating systems and kernels
In order to effectively abstract an application’s memory, data and execution flows even further, a wide array of kernels and operating systems are usually available for a given processor.
In applications where there is a set schedule of tasks, a “super loop” is often implemented that repeatedly cycles through execution of independent code blocks. In this model, the sequence of events does not change between processing intervals. The “super loop” is common in high-performance systems, as it allows the programmer to retain most of the control over the order of processing. As a result, the block diagram of the data flow is usually pretty simple, but the scale of the processing (for instance, based on image size or frame rate) is usually greater.
Going a step beyond the “super loop” approach, a kernel provides basic thread creation and management. An operating system (OS) provides many more capabilities on top of these fundamental scheduling functions.
Typically, operating systems span a range of strengths and focus areas — for example, security, performance and code footprint. There is no “silver bullet” when it comes to these parameters. That is, if an OS has more security features, it may sacrifice on, say, performance or kernel size.
Where security and protection are the key OS selling points, vendors will rely heavily on a processor’s MMU features. Many of the application targets for this type of OS, where safety might be the prime attribute, require fault tolerance. Here, both User and Supervisor modes are used to protect processor resources. Cache is heavily used in this model, and any on-chip SRAM is reserved for interrupt service routines and the most frequently executed code.
On the other hand, OS vendors who focus on performance and low memory footprint are much less likely to use the MMU, and they are likely to run in Supervisor mode only. With a smaller memory footprint, more of the application can be run in on-chip memory. While there will be a C API to the OS, most of the underlying code is implemented in assembly language. Interrupt service routines will have low context switching times, because only a subset of registers are saved to the stack.
A traditional multitasking real-time operating system (RTOS) can potentially bog down the signal processing portion of a convergent processing application. On the other hand, a simple loop kernel, such as might be used to support a DSP application, doesn’t suit well the control-based portion of the application.
This has led some OS vendors to use the resources of a processor’s architecture to balance both control and signal processing applications on the same convergent device. By managing the DMA and interrupt priorities, performance-based tasks can garner guaranteed processing time in a way similar to that of an isochronous transfer in the USB protocol.
David Katz and Rick Gentile are senior DSP applications engineers at Analog Devices, Inc.