uClinux on the Blackfin DSP Architecture: Part 1
In the past years, Linux has become an increasingly popular operating system choice not only in the PC and Server market, also in the development of embedded devices - particularly consumer products, telecommunications routers and switches, Internet appliances, and industrial and automotive applications.
The advantage of Embedded Linux is that it is a royalty-free, open source, compact solution that provides a strong foundation for an ever-growing base of applications to run on. Linux is a fully functional operating system (OS), with support for a variety of network and file-handling protocols - a very important requirement in embedded systems because of the need to "connect and compute anywhere at anytime."
Modular in nature, Linux is easy to slim down by removing utility
programs, tools, and other system services that are not needed in the
targeted embedded environment. The advantages for companies using Linux
in embedded markets are faster time to market, flexibility and
For those developers, the combination of converged architectures such as the Blackfin Processor and uClinux may be of particular interest. Blackfin processors  combine the DSP computing power and the functionality of microcontrollers, fulfilling the requirements of digital audio and communication applications. The combination of a DSP core with traditional microcontroller architecture on a single chip avoids the restrictions, complexity, and higher costs of traditional heterogeneous multiprocessor systems.
All Blackfin Processors combine a state-of-the-art signal processing engine with the advantages of a clean, orthogonal RISC-like microprocessor instruction set and Single-Instruction Multiple-Data (SIMD) multimedia capabilities into a single instruction set architecture. The Micro Signal Architecture (MSA) core is a dual-MAC (Multiply Accumulator Unit) modified Harvard Architecture that has been designed to have unparalleled performance on typical signal processing algorithms, as well as standard program flow and arbitrary bit manipulation operations mainly used by an OS. Both MACs can be used in the same operation and single cycle to double the MAC throughput, such as, for example the dual MAC Blackfin assembly instruction below:
R3 = (A1 += R7.H * R6.H), R2 = (A0 += R7.L * R6.L);
As shown in Figure 1 below, the single core Blackfin Processors have two large blocks of on-chip memory providing high-bandwidth access to the core. These memory blocks are accessed at full processor core speed (up to 756MHz). The two memory blocks sitting next to the core, referred to as L1 memory, can be configured either as data or instruction SRAM or cache.
When configured as cache, the speed of executing external code from SDRAM is nearly on par with running the code from internal memory. This feature is especially well suited for running the uClinux kernel, which doesn't fit into internal memory. Also, when programming in C, the memory access optimization can be left up to the core by using cache.
|Figure 1: Single core Blackfin processor|
There are a countless number of commercial and non-commercial Linux kernel trees and distributions. One of the special trees is the uClinux kernel tree, at www.uclinux.org . This is a port of the Linux kernel designed for hardware without a Memory Management Unit (MMU).
While the uClinux kernel patch has been included in the official Linux 2.6.x kernel , the most up-to-date development activity and projects can be found at uClinux Project Page  and Blackfin/uClinux Project Page  (www.blackfin.uclinux.org). Patches such as these are used by commercial Linux vendors in conjunction with their additional enhancements, development tools and documentation to provide their customers an easy-to-use development environment for rapidly creating powerful applications on uClinux.
Additionally, www.uclinux.org provides developers with a uClinux distribution that includes three different kernels (2.0.x, 2.4.x, 2.6.x) along with required libraries; basic Linux shells and tools; and a wide range of additional programs such as web server, audio player, programming languages, and a graphical configuration tool. There are also programs specially designed with size and efficiency as their primary considerations.
One example is busybox , a multicall binary, which is a program that includes the functionality of a lot of smaller programs and acts like any one of them if it is called by the appropriate name. If busybox is linked to ls and contains the ls code, it acts like the ls command.
The benefit of this is that busybox saves some overhead for unique binaries, and those small modules can share common code. In general, the uClinux distribution is more than adequate enough to compile a Linux image for a communication device, like a router, without writing a single line of code.
Despite the fact that Linux was not originally designed for use in embedded systems, it has found its way into a lot of embedded devices. Since the release of kernel version 2.0.x and the appearance of commercial support for Linux on embedded processors, there has been a real explosion of new embedded devices that feature the OS.
Almost every day there seems to be a new device or gadget that uses Linux as its operating system, in most cases going completely unnoticed by the end users. Today a large number of the available broadband routers, firewalls, access points, and even some DVD players utilize Linux, for more examples see Linux devices . uClinux same as Linux offer a huge amount of drivers for all sorts of hardware and protocols. Combine that with the fact that Linux does not have run-time royalties, and it quickly becomes clear why there are so many developers using Linux for their devices.
Linux on a DSP-like processor
In the past, DSPs have been used in a lot of applications, including sound cards, modems, telecommunication devices, medical devices, and all sorts of military and other appliances that perform pure signal processing. Those DSP systems were generally designed specifically for those applications and had only basic capabilities in order to meet their tight cost and size constraints.
As DSPs have become more powerful and flexible, thereby servicing
the more advanced requirements of military, medical, and communication
users, they still have lacked the proper capabilities to run advanced
operating systems. Those traditional DSPs are very powerful and
flexible, but can be rather expensive.
They are often found clustered on special signal processing hardware where there is no need to have an operating system like Linux running on the DSP itself. This is generally due to the fact that in those systems the DSP gets its data from some type of additional central processing unit. Therefore only "basic" system software had to be written for such DSPs.
With the quickly advancing multimedia convergence and the proliferation of multimedia and communication-enabled gadgets, there is now a big market for a new type of DSP. In the past, the most widely used design for servicing these markets is the combination of a general-purpose processor and a traditional DSP serving as a coprocessor. In this scenario, the operating system runs on the host processor, and the signal processing is done on the DSP. This type of dual-processor design is suboptimal due to inefficiencies incurred in maintainability, cost, power, and size. A different approach could be, the redesign of the traditional DSP to fit the demand of an advanced operating system while preserving the advanced DSP architecture.
This approach has been taken by the Blackfin Processor designers—by designing a processor with advanced DSP features around the well-proven Harvard Architecture with a RISC-like orthogonal enhanced instruction set. Also featuring advanced addressing, stack control and privileged operation modes. Such a device is no longer a simple DSP, but rather a powerful processor that will meet the intensive demands of a wide range of industrial, communication and multimedia applications.
Combined with the capabilities and the power of an operating system like Linux, there are endless possibilities. Nevertheless on the General Purpose Processor side vendors are not sleeping an in turn designing their new processors to compete in the same market. So it comes down to the point " where for processors it's just the 5 P's rule : price, performance, power consumption, peripherals, and penguins.
Differences between Linux and
Since Linux and uClinux is similar to UNIX in that it is a multiuser, multitasking OS, the kernel has to take special precautions to assure the proper and safe operation of up to thousands of processes from different users on the same system at once. The UNIX security model, after which Linux is designed, protects every process in its own environment with its own private address space. Every process is also protected from processes being invoked by different users.
Additionally, a Virtual Memory (VM) system has additional requirements that the Memory Management Unit (MMU) must handle, like dynamic allocation of memory and mapping of arbitrary memory regions into the private process memory.
Some processors, like Blackfin, do not provide a full-fledged MMU. These processors are more power efficient and significantly cheaper than the alternatives, while sometimes having higher performance. Even on processors featuring Virtual Memory, some system developers target their application to run on uClinux, because uClinux can be significantly faster than Linux on the same processor. MMU operation can represent a significant time overheard.
Even when a MMU is available, it is sometimes not used in systems with high real-time constraints. Context switching and Inter Process Communication (IPC) can also be several times faster on uClinux. A benchmark on an ARM 9 processor, done by H.S. Choi and H.C. Yun, has proven this .
To support Linux on these MMU-less devices, a few trade-offs have to
1. No real memory protection (a faulty process can bring the complete system down)
2. No fork system call
3. Only simple memory allocation
4. Some other minor differences
Memory protection is not a real problem for most embedded devices. Linux is a very stable platform, particularly in embedded devices, where software crashes are rarely observed. Even on a MMU based system running Linux, software bugs in the kernel space can crash the whole system. Since Blackfin has memory protection, but not Virtual Memory, Blackfin/uClinux has better protection than other no-MMU systems, and will not crash as "often" as uClinux running on different processors.
There are two most common principal reasons causing uClinux to crash - stack overflow and null pointer reference.
When Linux is running on an architecture where a full MMU exists, the MMU provides Linux programs basically unlimited stack and heap space. This is done by the virtualization of physical memory. However most embedded Linux systems will have a fixed amount of SDRAM, and no SWAP " so it is not really "unlimited".
A program with a memory leak can still crash the entire system on embedded Linux with MMU. Because uClinux can't support VM, it allocates stack space during compile time at the end of the data for the executable. If the stack grows too large on uClinux, it will overwrite the static data and code areas. This means that the developer, who previously was oblivious to stack usage within the application, must now be aware of the stack requirements.
On Blackfin/uClinux - there is a compiler option to enable stack checking. If the option fstack-limit-symbol=_stack_start is set, the compiler will add in extra code, which checks to ensure that the stack is not exceeded. This will ensure that random crashes due to stack corruption/overflow will not happen on Blackfin/uClinux. Once a application compiled with this option and exceeding it's stack limit, gracefully dies. The developer then can increase the stack size at compile time or with the flthdr utility program during runtime. On production systems, stack checking can either be removed (increase performance/reduce code size), or left in for the increase in robustness.
Null pointer reference
The Blackfin MMU does provide partial memory protection, and can segment user space from kernel (supervisor) space. On Blackfin/uClinux, the first 4k of memory starting at NULL is reserved as a buffer for bad pointer dereferences. If an application uses a uninitialized pointer that reads or writes into the first 4k of memory, the application will halt. This will ensure that random crashes due to uninitialized pointers are less likely to happen. Other implementations of uClinux will start writing over the kernel.
The second point can be little more problematic. In software written for UNIX or Linux, developers sometimes use the fork system call when they want to do things in parallel. The fork() call makes an exact copy of the original process and executes it simultaneously. To do that efficiently, it uses the MMU to map the memory from the parent process to the child and copies only those memory parts to that child it writes.
Therefore, uClinux cannot provide the fork() system call. It does however provide vfork(), a special version of fork(), in which the parent is halted while the child executes. Therefore, software that uses the fork() system call has to be modified to use either vfork() or POSIX threads that uClinux supports, because they share the same memory space, including the stack.
As for point number three, there usually is no problem with the malloc support uClinux provides, but sometimes minor modifications may have to be made. Memory allocation on uClinux can be very fast, but on the other hand a process can allocate all available memory. Since memory can be only allocated in contiguous chunks, memory fragmentation can be sometimes an issue.
Most of the software available for Linux or UNIX (a collection of software can be found on http://freshmeat.net) can be directly compiled on uClinux. For the rest there is usually only some minor porting or tweaking to do. There are only very few applications that do not work on uClinux, with most of those being irrelevant for embedded applications.
In Part 2 in this three part series the author surveys the development tools, environments and libraries available for DSP-oriented applications including VoIP, audio compression, and image capture and processing, the ways to most effectively use them and how to avoid problems.
Since obtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH) (Electronics and Information Technologies) Degree from the Reutlingen University , Michael Hennerich has worked as a design engineer on a variety of DSP based applications. Michael now works as a DSP Applications and Systems Engineer at Analog Devices Inc. in Munich, Germany.
This article is excerpted from a
paper of the same name presented at the Embedded Systems Conference
Silicon Valley 2006. Used with permission of the Embedded Systems
Conference. For more information, please visit www.embedded.com/esc/sv.