uClinux on the Blackfin DSP Architecture: Part 3 - Embedded.com

uClinux on the Blackfin DSP Architecture: Part 3

DSP processors are in general very I/O balanced processors. This meansthey offer a variety of high speed serial and parallel peripheralinterfaces. These interfaces are ideally designed in a way, that theycan be operated with very low or none overhead impact to the processorcore, leaving enough CPU time for running the OS and processing theincoming or outgoing data.

A Blackin Processor as an example has multiple, fexible andindependent Direct Memory Access (DMA) controllers. DMA transfers canoccur between the processor’s internal memories and any of itsDMA-capable peripherals. Additionally, DMA transfers can be performedbetween any of the DMA-capable peripherals and external devicesconnected to the external memory interfaces, including the SDRAMcontroller and the asynchronous memory controller.

The Blackfin processor provides besides other interfaces a ParallelPeripheral Interface (PPI) that can connect directly to parallel D/Aand A/D converters, ITU-R-601/656 video encoders and decoders, andother general-purpose peripherals, such as CMOS camera sensors. The PPIconsists of a dedicated input clock pin, up to 3 frame synchronizationpins, and up to 16 data pins.

Figure 1 below is an example ofhow easily a CMOS imaging sensor can be wired to a Blackfin Processor,without the need of additional active hardware components.

Figure1: Micron CMOS Camera Sensor wiring diagram

Below is example code for asimple program that reads from a CMOS Camera Sensor, assuming a PPIdriver is compiled into the kernel or loaded as a kernel module. Thereare two different PPI drivers available, a generic full featureddriver, supporting various PPI operation modes (ppi.c), and a simplePPI Frame Capture Driver (adsp-ppifcd.c). Latter is here used.

The application opens the PPI device driver, performs some I/Ocontrols (ioctls), setting the number of pixels per line and the numberof lines to be captured. After the application invokes the read systemcall, the driver arms the DMA transfer. The start of a new frame isdetected by the PPI peripheral, by monitoring the Line- and Frame-Validstrobes.

A special correlation between the two signals indicates the start offrame, and kicks-off the DMA transfer, capturing pixels per line timeslines samples. The DMA engine stores the incoming samples at theaddress allocated by the application. After the transfer is finished,execution returns to the application.

The image is then converted into the PNG (Portable Network Graphic)format, utilizing libpng included in the uClinux distribution. Theconverted image is then written to stdout. Assuming the compiledprogram executable is called readimg, a command line to execute theprogram, writing the converted output image to a file, can look likefollowing:


Audio, Video and Still Image Silicon Products widely use a I2Ccompatible Two Wire Interface (TWI) as a system configuration bus. Theconfiguration bus allows a system master to gain access over deviceinternal configuration registers, such as brightness. Usually, I2Cdevices are controlled by a kernel driver. But it is also possible toaccess all devices on an adapter from user space, through the /devinterface. Following example shows how to write a value of 0x248 intoregister 9 of a I2C slave device identified by I2C_DEVID :

The power of Linux is the inexhaustible number of applications releasedunder various open source licenses that can be cross compiled to run onthe embedded uClinux system. Cross compiling can be sometimes a littlebit tricky, that’s why it’s discussed here.

Cross compiling
Linux or UNIX is not a single platform, there is a wide range ofchoices. Most programs distributed as source code are coming with aso-called 'configure' script. This is a shell script that must be runto recognize the current system configuration, so that the correctcompiler switches, library paths and tools will be used.

When there isn’t a configure script, the developer can manuallymodify the Makefile to add target processor specific changes, or canintegrate it into the uClinux distribution. Detailed instructions canbe found here [18]. The configure script is usually a big script, andit takes quite a while to execute. When this script is created fromrecent autoconf releases, it will work for Blackfin/uClinux with minoror none modifications.

The configure shell script inside a source package, can be executedfor cross compilation using following command line:

CC='bfin-uclinux-gcc –O2 -Wl,-elf2flt' ./configure –host=bfin-uclinux–build=i686-linux

Alternatively:

./configure –host=bfin-uclinux –build=i686-linuxLDFLAGS='-Wl,-elf2flt' CFLAGS=-O2

There are at least two events that are able to stop the runningscript: (1) some of the filesused by the script are too old or (2) there are missing tools or libraries. If the supplied scripts are tooold to execute properly for bfin-uclinux, or they don't recognizebfin-uclinux as a possible target. The developer need to replaceconfig.sub with more recent version form (e.g. a up to date gcc sourcedirectory). Only in very few cases cross compiling is not supported bythe configure.in script manually written by the author and used byautoconf. In this case latter file can be modified to remove or changethe failing test case.

Network Oscilloscope Demo
The Network Oscilloscope Demo shown in Figure2 below is one of the sample applications, besides the VoIPLinphone Application or the Networked Audio Player, included in theBlackfin/uClinux distribution. Purpose of the Network OscilloscopeProject is to demonstrates a simple remote GUI (Graphical UserInterface) mechanism to share access and data, distributed over aTCP/IP network. Furthermore it demonstrates the integration of severalopen source projects and libraries as building blocks into singleapplication.

For instance gnuplot, a portable command-line driven interactivedata file and function plotting utility, is used to generate graphicaldata plots, while thttpd a CGI (Common Gateway Interface) capable webserver is servicing incoming HTTP requests. CGI is typically used togenerate dynamic webpages. It's a simple protocol to communicatebetween web forms and a specified program. A CGI script can be writtenin any language, including C/C++ ,that can read stdin, write to stdout,and read environment variables.

The Network Oscilloscope works as following. A remote web browsercontacts the HTTP server running on uClinux where the CGI scriptresides, and asks it to run the program. Parameters from the HTML formsuch as sample frequency, trigger settings and displaying options arepassed to the program through the environment. The called programsamples data from a externally connected Analog to Digital Converter(ADC) using a Linux device driver (adsp-spiadc.c).

Incoming samples are preprocessed and stored in a file. The CGIprogram then starts gnuplot as a process and requests to generate a PNGor JPEG image based on the sampled data and form settings. Thewebserver takes the output of the CGI program and tunnels it through tothe web browser. The web browser displays the output as an HTML page,including the generated image plot.

Figure2

Real-time capabilities of uClinux
Since Linux was originally developed for server and desktop usage, ithas no hard real-time capabilities like most other operating systems ofcomparable complexity and size. Nevertheless, Linux—and in particular,uClinux—has excellent so-called “soft real-time” capabilities. Thismeans that while Linux or uClinux cannot guarantee certain interrupt orscheduler latency compared with other operating systems of similarcomplexity, they show very favorable performance characteristics. Ifone needs a so-called “hard real-time” system that can guaranteescheduler or interrupt latency time, there are a few ways to achievesuch a goal:

1) Provide the real-timecapabilities in the form of an underlying minimal real-time kernel suchas RT-Linux (http://www.rtlinux.org)or RTAI (http://www.rtai.org). Both solutions use a small real-timekernel that runs Linux as a real-time task with lower priority.Programs that need predictable real time are designed to run on thereal-time kernel and are specially coded to do so. All other tasks andservices run on top of the Linux kernel and can utilize everything thatLinux can provide. This approach can guarantee deterministic interruptlatency while preserving the flexibility that Linux provides.

2) Provide the real-timecapabilities using Xenomai [19]. Xenomai is a real-time developmentframework cooperating with the Linux kernel, in order to provide apervasive, interface-agnostic, hard real-time support to user-spaceapplications, seamlessly integrated into the GNU/Linux environment. Itis based on an abstract RTOS core, usable for building any kind ofreal-time interfaces, over a nucleus which exports a set of genericRTOS services. Any number of RTOS personalities called “skins” can thenbe built over the nucleus, providing their own specific interface tothe applications, by using the services of a single generic core toimplement it. Aside of its own native and POSIX interfaces, Xenomaialso provides emulators for the VxWorks, VRTX, pSOS+ and uITRONpersonalities. People interested in learning more about this projectcan refer to the on-line documentation [21].

For the initial Blackfin port, included in Xenomai v2.1 [20], theworst-case scheduling latency observed so far with user-space Xenomaithreads on a Blackfin BF533 is slightly lower than 50 us under load,with an expected margin of improvement of 10-20 us, in the future.

Xenomai and RTAI use Adeos [22] as a underlying Hardware AbstractionLayer (HAL). Adeos is a real-time enabler for the Linux kernel. To thisend, it enables multiple prioritized O/S domains to existsimultaneously on the same hardware, connected through an interruptpipeline.

Xenomai as well as Adeos has been ported to the Blackfinarchitecture by Philippe Gerum who leads both projects. Thisdevelopment has been significantly sponsored by Openwide, a specialistin embedded and real-time solutions for Linux [23].

Nevertheless in most cases, hard real time is not needed,particularly for consumer multimedia applications, in which the timeconstraints are dictated by the abilities of the user to recognizeglitches in audio and video. Those physically detectable constraintsthat have to be met normally lie in the area of milliseconds—which isno big problem on fast chips like the Blackfin Processor. In Linuxkernel 2.6.x, the new stable kernel release, those qualities have evenbeen improved with the introduction of the new O(1) scheduler.

Figures 3 and 4below show the context switch time for a default Linux 2.6.xkernel running on Blackfin/uClinux:

Figure3

Figure4

Context Switch time was measured with lat_ctx from lmbench . Theprocesses are connected in a ring of Unix pipes. Each process reads atoken from its pipe, possibly does some work, and then writes the tokento the next process. As number of processes increases, effect of cacheis less. For 10 processes the average context switch time is 16.2us,and  with a standard deviation of .58, 95% of time, is under 17us.

Comclusion
Blackfin Processors offer a good price performance ratio (800 MMAC @400 MHz for less than $5/unit in quantities), advanced power managementfunctions, and small mini-BGA packages. This represents a very lowpower, cost and space-efficient solution. The Blackfin’s advanced DSPand multimedia capabilities qualify it not only for audio and videoappliances, but also for all kinds of industrial, automotive, andcommunication devices.

Development tools are well tested, documented and include everythingnecessary to get started and successfully finished in-time. Anotheradvantage of the Blackfin Processor in combination with uClinux is theavailability of a wide range of applications, drivers, libraries andprotocols, often as open source or free software. In most cases, thereis only basic cross compilation necessary to get that software up andrunning.

Combine this with such invaluable tools as Perl, Python, MySQL andPHP, and developers have the opportunity to develop even the mostdemanding feature-rich applications in a very short time frame, oftenwith enough processing power left for future improvements and newfeatures.

Sinceobtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH)(Electronics and Information Technologies) Degree from the ReutlingenUniversity, Michael Hennerich has worked as a design engineer on avariety of DSP based applications. Michael now works as a DSPApplications and Systems Engineer at AnalogDevices Inc. in Munich.

This article is excerpted from apaper of the same name presented at the Embedded Systems ConferenceSilicon Valley 2006. Used with permission of the Embedded SystemsConference. For more information, please visit www.embedded.com/esc/sv.

References
[1] Analog Devices, Inc.Blackfin Processors
[2] uClinux Project Page
[3] The Linux Kernel Archives
[4] The Blackfin/uClinuxProject Page
[5] Busybox Project Page
[6] Linuxdevices
[7] Context Switching and IPC Performance Comparison between uClinuxandLinux on the ARM9 based Processor, by Hyok-Sung Choi, Hee-ChulYun
[8] Linux Test Project (LTP)
[9] DejaGnu – GNUProject – Free Software Foundation (FSF)
[10] Blackfin/uClinuxDocumentation DokuWiki
[11] ADSP-BF537STAMP Board Support Package (BSP)
[12] GCC Home Page – GNU Project -Free Software Foundation (FSF)
[15] Cooperative Linux Project Page
[16] Das U-Boot -Universal Bootloader Project Page
[17] GCC Code-Size Benchmark Environment (CSiBE) Department ofSoftware Engineering, University of Szeged
[18] Blackfin/uClinux Documentation DokuWiki
[19] XenomaiProject Page
[20] XenomaiDownload
[21] XenomaiDocumentation
[22] Adeos Project Page
[23] Openwide

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.