uClinux on the Blackfin DSP Architecture: Part 1

In the past years, Linux has become an increasingly popular operatingsystem choice not only in the PC and Server market, also in thedevelopment of embedded devices – particularly consumer products,telecommunications routers and switches, Internet appliances, andindustrial and automotive applications.

The advantage of Embedded Linux is that it is a royalty-free, opensource, compact solution that provides a strong foundation for anever-growing base of applications to run on. Linux is a fullyfunctional operating system (OS), with support for a variety of networkand file-handling protocols – a very important requirement in embeddedsystems because of the need to “connect and compute anywhere atanytime.”

Modular in nature, Linux is easy to slim down by removing utilityprograms, tools, and other system services that are not needed in thetargeted embedded environment. The advantages for companies using Linuxin embedded markets are faster time to market, flexibility andreliability.

For those developers, the combination of converged architectures suchas the Blackfin Processor and uClinux may be of particular interest.Blackfin processors [1] combine the DSP computing power and thefunctionality of microcontrollers, fulfilling the requirements ofdigital audio and communication applications. The combination of a DSPcore with traditional microcontroller architecture on a single chipavoids the restrictions, complexity, and higher costs of traditionalheterogeneous multiprocessor systems.

All Blackfin Processors combine a state-of-the-art signal processingengine with the advantages of a clean, orthogonal RISC-likemicroprocessor instruction set and Single-Instruction Multiple-Data(SIMD) multimedia capabilities into a single instruction setarchitecture. The Micro Signal Architecture (MSA) core is a dual-MAC(Multiply Accumulator Unit) modified Harvard Architecture that has beendesigned to have unparalleled performance on typical signal processingalgorithms, as well as standard program flow and arbitrary bitmanipulation operations mainly used by an OS. Both MACs can be used inthe same operation and single cycle to double the MAC throughput, suchas, for example the dual MAC Blackfin assembly instruction below:

R3 = (A1 += R7.H * R6.H), R2 = (A0 += R7.L * R6.L);

As shown in Figure 1 below, the single core Blackfin Processors havetwo large blocks of on-chip memory providing high-bandwidth access tothe core. These memory blocks are accessed at full processor core speed(up to 756MHz). The two memory blocks sitting next to the core,referred to as L1 memory, can be configured either as data orinstruction SRAM or cache.

When configured as cache, the speed of executing external code fromSDRAM is nearly on par with running the code from internal memory. Thisfeature is especially well suited for running the uClinux kernel, whichdoesn't fit into internal memory. Also, when programming in C, thememory access optimization can be left up to the core by usingcache. 

AnalogcLinuxFig1
Figure1: Single core Blackfin processor

There are a countless number of commercial and non-commercial Linuxkernel trees and distributions. One of the special trees is the uClinuxkernel tree, at www.uclinux.org [2]. This is a port of the Linux kerneldesigned for hardware without a Memory Management Unit (MMU).

While the uClinux kernel patch has been included in the officialLinux 2.6.x kernel [3], the most up-to-date development activity andprojects can be found at uClinux Project Page [2] and Blackfin/uClinuxProject Page [4] (www.blackfin.uclinux.org). Patches such as these areused by commercial Linux vendors in conjunction with their additionalenhancements, development tools and documentation to provide theircustomers an easy-to-use development environment for rapidly creatingpowerful applications on uClinux.

Additionally, www.uclinux.org provides developers with a uClinuxdistribution that includes three different kernels (2.0.x, 2.4.x,2.6.x) along with required libraries; basic Linux shells and tools; anda wide range of additional programs such as web server, audio player,programming languages, and a graphical configuration tool. There arealso programs specially designed with size and efficiency as theirprimary considerations.

One example is busybox [5], a multicall binary, which is a programthat includes the functionality of a lot of smaller programs and actslike any one of them if it is called by the appropriate name. Ifbusybox is linked to ls and contains the ls code, itacts like the ls command.

The benefit of this is that busybox saves some overhead for uniquebinaries, and those small modules can share common code. In general,the uClinux distribution is more than adequate enough to compile aLinux image for a communication device, like a router, without writinga single line of code.

Despite the fact that Linuxwas not originally designed for use in embedded systems, it has foundits way into a lot of embedded devices. Since the release of kernelversion 2.0.x and the appearance of commercial support for Linux onembedded processors, there has been a real explosion of new embeddeddevices that feature the OS.

Almost every day there seems to be a new device or gadget that usesLinux as its operating system, in most cases going completely unnoticedby the end users. Today a large number of the available broadbandrouters, firewalls, access points, and even some DVD players utilizeLinux, for more examples see Linux devices [6]. uClinux same as Linuxoffer a huge amount of drivers for all sorts of hardware and protocols.Combine that with the fact that Linux does not have run-time royalties,and it quickly becomes clear why there are so many developers usingLinux for their devices.

Linux on a DSP-like processor
In the past, DSPs have been used in a lot of applications, includingsound cards, modems, telecommunication devices, medical devices, andall sorts of military and other appliances that perform pure signalprocessing. Those DSP systems were generally designed specifically forthose applications and had only basic capabilities in order to meettheir tight cost and size constraints.

As DSPs have become more powerful and flexible, thereby servicingthe more advanced requirements of military, medical, and communicationusers, they still have lacked the proper capabilities to run advancedoperating systems. Those traditional DSPs are very powerful andflexible, but can be rather expensive.

They are often found clustered on special signal processing hardwarewhere there is no need to have an operating system like Linux runningon the DSP itself. This is generally due to the fact that in thosesystems the DSP gets its data from some type of additional centralprocessing unit. Therefore only “basic” system software had to bewritten for such DSPs.

With the quickly advancing multimedia convergence and theproliferation of multimedia and communication-enabled gadgets, there isnow a big market for a new type of DSP. In the past, the most widelyused design for servicing these markets is the combination of ageneral-purpose processor and a traditional DSP serving as acoprocessor. In this scenario, the operating system runs on the hostprocessor, and the signal processing is done on the DSP. This type ofdual-processor design is suboptimal due to inefficiencies incurred inmaintainability, cost, power, and size. A different approach could be,the redesign of the traditional DSP to fit the demand of an advancedoperating system while preserving the advanced DSP architecture.

This approach has been taken by the Blackfin Processor designers—bydesigning a processor with advanced DSP features around the well-provenHarvard Architecture with a RISC-like orthogonal enhanced instructionset. Also featuring advanced addressing, stack control and privilegedoperation modes. Such a device is no longer a simple DSP, but rather apowerful processor that will meet the intensive demands of a wide rangeof industrial, communication and multimedia applications.

Combined with the capabilities and the power of an operating systemlike Linux, there are endless possibilities. Nevertheless on theGeneral Purpose Processor side vendors are not sleeping an in turndesigning their new processors to compete in the same market. So itcomes down to the point ” where for processors it's just the 5 P's rule: price, performance, power consumption, peripherals, and penguins.

Differences between Linux anduClinux
Since Linux and uClinux is similar to UNIX in that it is a multiuser,multitasking OS, the kernel has to take special precautions to assurethe proper and safe operation of up to thousands of processes fromdifferent users on the same system at once. The UNIX security model,after which Linux is designed, protects every process in its ownenvironment with its own private address space. Every process is alsoprotected from processes being invoked by different users.

Additionally, a Virtual Memory (VM) system has additionalrequirements that the Memory Management Unit (MMU) must handle, likedynamic allocation of memory and mapping of arbitrary memory regionsinto the private process memory.

Some processors, like Blackfin, do not provide a full-fledged MMU.These processors are more power efficient and significantly cheaperthan the alternatives, while sometimes having higher performance. Evenon processors featuring Virtual Memory, some system developers targettheir application to run on uClinux, because uClinux can besignificantly faster than Linux on the same processor. MMU operationcan represent a significant time overheard.

Even when a MMU is available, it is sometimes not used in systemswith high real-time constraints. Context switching and Inter ProcessCommunication (IPC) can also be several times faster on uClinux. Abenchmark on an ARM 9 processor, done by H.S. Choi and H.C. Yun, hasproven this [7].

To support Linux on these MMU-less devices, a few trade-offs have tobe made:
1. No real memory protection(a faulty process can bring the complete system down)
2. No fork system call
3 . Only simple memoryallocation
4. Some other minor differences

Memory protection is not a real problem for most embedded devices.Linux is a very stable platform, particularly in embedded devices,where software crashes are rarely observed. Even on a MMU based systemrunning Linux, software bugs in the kernel space can crash the wholesystem. Since Blackfin has memory protection, but not Virtual Memory,Blackfin/uClinux has better protection than other no-MMU systems, andwill not crash as “often” as uClinux running on different processors.

There are two most common principal reasons causing uClinux to crash- stack overflow and null pointer reference.

Stack overflow
When Linux is running on an architecture where a full MMU exists, theMMU provides Linux programs basically unlimited stack and heap space.This is done by the virtualization of physical memory. However mostembedded Linux systems will have a fixed amount of SDRAM, and no SWAP “so it is not really “unlimited”.

A program with a memory leak can still crash the entire system onembedded Linux with MMU. Because uClinux can't support VM, it allocatesstack space during compile time at the end of the data for theexecutable. If the stack grows too large on uClinux, it will overwritethe static data and code areas. This means that the developer, whopreviously was oblivious to stack usage within the application, mustnow be aware of the stack requirements.

On Blackfin/uClinux – there is a compiler option to enable stackchecking. If the option  fstack-limit-symbol=_stack_start  is set, the compiler willadd in extra code, which checks to ensure that the stack is notexceeded. This will ensure that random crashes due to stackcorruption/overflow will not happen on Blackfin/uClinux. Once aapplication compiled with this option and exceeding it's stack limit,gracefully dies. The developer then can increase the stack size atcompile time or with the flthdr utility program during runtime. Onproduction systems, stack checking can either be removed (increaseperformance/reduce code size), or left in for the increase inrobustness.

Null pointer reference
The Blackfin MMU does provide partial memory protection, and cansegment user space from kernel (supervisor) space. On Blackfin/uClinux,the first 4k of memory starting at NULL is reserved as a buffer for badpointer dereferences. If an application uses a uninitialized pointerthat reads or writes into the first 4k of memory, the application willhalt. This will ensure that random crashes due to uninitializedpointers are less likely to happen. Other implementations of uClinuxwill start writing over the kernel.

The second point can be little more problematic. In software writtenfor UNIX or Linux, developers sometimes use the fork system call whenthey want to do things in parallel. The fork() callmakes an exact copy of the original process and executes itsimultaneously. To do that efficiently, it uses the MMU to map thememory from the parent process to the child and copies only thosememory parts to that child it writes.

Therefore, uClinux cannot provide the fork() systemcall. It does however provide vfork() , aspecial version of fork() ,in which the parent is halted while the child executes. Therefore,software that uses the fork() systemcall has to be modified to use either vfork() or POSIXthreads that uClinux supports, because they share the same memoryspace, including the stack.

As for point number three, there usually is no problem with the malloc supportuClinux provides, but sometimes minor modifications may have to bemade. Memory allocation on uClinux can be very fast, but on the otherhand a process can allocate all available memory. Since memory can beonly allocated in contiguous chunks, memory fragmentation can besometimes an issue.

Most of the software available for Linux or UNIX (a collection ofsoftware can be found on http://freshmeat.net) can be directly compiled onuClinux. For the rest there is usually only some minor porting ortweaking to do. There are only very few applications that do not workon uClinux, with most of those being irrelevant for embeddedapplications.

In Part 2 in this three part series theauthor surveys the development tools, environments and librariesavailable for DSP-oriented applications including VoIP, audiocompression, and image capture and processing, the ways to mosteffectively use them and how to avoid problems.

Sinceobtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH)(Electronics and Information Technologies) Degree from the ReutlingenUniversity , Michael Hennerich has worked as a design engineer on avariety of DSP based applications. Michael now works as a DSPApplications and Systems Engineer at AnalogDevices Inc. in Munich, Germany.

This article is excerpted from apaper of the same name presented at the Embedded Systems ConferenceSilicon Valley 2006. Used with permission of the Embedded SystemsConference. For more information, please visit www.embedded.com/esc/sv.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.