Every embedded system has the potential to reach a limit whererequirements cannot be implemented without compromise to the design, orperhaps at all. When that time comes, the company's management andtechnical staff must agree on what to do. This usually reduces to twochoices.
The first is to make the minimal necessary changes to maintain thestability and momentum of the development. This may work for a while,but will likely force the company to embrace the other choice. Thatchoice is to retire part or all of the system and rebuild it on afoundation that will, at the minimum, meet the current and near futurerequirements, but optimistically, be a profitable foundation for asignificant time into the future.
This paper describes the journey taken by engineers at IntellibotRobotics to port the functionality of an autonomous robotic floorscrubber (referred to simply as “the robot” for the rest of the paper)from one hardware platform without an operating system to anotherrunning Linux. Our intention is to share the hands-on knowledge gainedand to provide examples useful for making informed business decisionsabout using Linux for an embedded system. Knowing the capabilities thatLinux has will guide this choice. This paper illustrates some of thechallenges we faced in discovering those capabilities.
Background on “the robot”
The development project that produced the robot dates back to the early90's. To appreciate the migration, first an understanding of the oldrobot is necessary as this is the platform that we replaced with onerunning Linux. A 68000 main processor and a handful of peripheralmicrocontrollers controlled the machine. To drive down cost and addsome headroom for more features, later generations migrated to a 68332.
The architecture was fairly typical of small microcontroller-basedsystems of the time: it had no operating system, the code base wasrelatively small and for what it had to do, it was adequately fast. Itdidn't use an operating system mainly because there wasn't a need forone. The code was single-threaded, and any OS-type services needed wereeasily written without concern of what the rest of the system wasdoing.
Although the product was technically functional, it lacked featuresto satisfy market needs. Over time, the purpose of the robot grew fromsimply being an intelligent floor cleaner to becoming an autonomousmachine that needed even less human intervention. This growth directlytranslated to more technical requirements. There were more sensors tomanage, more data to collect, more computations to complete in the sameamount of time.
With the first few generations of the robot, the addition of newfeatures implemented in software was largely successful. In late 1999,however, the requirements put on the machine began to exceed what theplatform could provide. There followed an episode involving theintroduction of a large amount of software to this system to meet thesenew requirements.
Not long into that effort, we discovered that the processor couldnot run the new software and still meet the requirements of controllingthe robot in real time. In embedded systems development, this is whatwe like to call a “dismal failure”. The product development stalled onthat front; entrance into that particular new market was put on hold.
After that, requirements were changed to enhance how the machineperformed in existing markets, since there was not time to improve thehardware. This new feature set kept the robot a viable product whileour engineering team worked at squeezing more functionality out of theexisting hardware, but that effort was quickly approaching theprocessing limits of the platform.
Making the transition
In late 2003, with the company under new ownership, the requirement toenter new markets became a priority again. Several discussions of thecurrent capabilities versus the desired capabilities quickly showed thenecessity of migrating the robot to a new platform.
Not knowing whether this would be a triumph or a disaster, ourmanagement simply stated that the product should be moved to Linux.Much of the robotic development in academia is happening on Linux andour owner wanted to leverage this. Though we were familiar with Linuxon workstations and servers, we hadn't used Linux as an RTOS.
There were many questions we had to answer to make the transitionsuccessful. However, with the direction now specified by our managementand with the appropriate financial backing, the decision to proceedwith Linux was made without any of the research that would typicallyoccur when making such a significant engineering change.
There was no comparison with or evaluation of other RTOSes. Therewas no investigation if Linux was even appropriate from a GNU licensingpoint of view. It was known that Linux was free but that's all. Ourmanagement was comfortable taking the risk that Linux would meet ourneeds.
Though we can gladly report that the migration was a success, werecommend that a complete investigation by both engineering andmanagement be undertaken to answer these questions in advance. The restof this paper exposes some of these questions and the answers we foundwhile attending the School of Hard Knocks.
Finding the right hardware platform
Shortly after the decision was made to move to Linux, we needed to finda suitable hardware platform capable of running Linux that would alsofit the needs of the robot. The current motherboard of the robot was aninexpensive custom design ideallysuited for the robot, but lackedsupport to run Linux.
Therefore, the search began for a low-cost single-board computer(SBC) that could run Linux. Though we preferred a PowerPC processor,the SBCs we found were far more expensive than equivalent x86¬basedcounterparts.
Using the existing subsystems of the robot required an SBC with anSPI bus. Yet, x86 processors –the target where Linux is most mature–generally do not have an SPI port. Internet searches did not yieldanything. By chance, we spotted an ad in Circuit Cellar Inc. thatshowed an x86 board with an SPI port, which the vendor claimed ranLinux. The cost of the board was about half of our existing custommotherboard, too. The migration was now on its way.
Looking back, here are some things to investigate when looking for aplatform to run Linux. If the hardware expertise does not existin-house to design a cheaper target board, then an off-the-shelf SBC isprobably the most cost-effective.
Also, look for a vendor who not only advertises that Linux runs ontheir products, but has the technical support available to help getyour application working with Linux. The vendor we chose offersunlimited free technical support, a bootable Linux file system on aCompactFlash, extensive downloads and support on their web site. Earlyon in powering up the system, we discovered a BIOS problem and theirtech support quickly fixed the problem. Without this, we would haveadded a significant delay in getting our problems resolved.
A nice side effect of the vendor we chose was that they are nowoffering SBCs using a faster ARM processor at a lower cost than an x86.With the product now ported to Linux, the migration to that board –orany other SBC that supports Linux –should entail little more than arecompilation, a significant benefit not yet realized.
Because the SBC had Linux pre-installed, it was simply a matter ofapplying power and connecting its serial port to a computer. Thesoftware support on the stock Linux file system was reasonablycomplete, and readily provided a login prompt on one of the SBC'sserial ports, as well as card support and network drivers for thePCMCIA wireless Ethernet adapter.
In very little time, the SBCs were communicating with othercomputers on both wired and wireless networks. The next important stepwas to block out a period of time to explore the capabilities of Linuxon the SBC. Understanding how it behaves in a normal situation isimperative before unfortunate situations arise.
Up to this point, the new SBC was indistinguishable from other Linuxsystems on the network: it responded to ping, telnet, and ftp. The filesystem layout was typical of a Linux system. It knew the correct timeand date.
At this point, a key feature of moving to Linux was made apparentand became a cornerstone to the entire development process. The filesystem of our Linux server could be NFS-mounted on to the SBC's filesystem. That meant that files could be effortlessly transferred betweenthe development server and the robot.
Though this mount worked over the wired Ethernet connection, thereal benefit was that it worked over a WiFi connection, too. This wouldmark the first time that the development of the mobile robot wouldbecome untethered. The foundation had been laid, upon which the firstlevel, a development environment, would be built.
The more development in an embedded environment resembles nativedevelopment, the faster and easier development in that embeddedenvironment will be. Cross development adds at least one step to thebuild process, and cross debugging is tricky and cumbersome incomparison to native development.
Despite this, we had established a workable, though tethered,cross-development and cross-debugging environment in previousgenerations of the robot. Moving away from this familiar environment toanother was a risk. However, the cross development and debugging systemset up in Linux turned out to be equally as powerful and simpler to getworking -¬and it was wireless.
The file system that came with the SBC fit on a 32 MB Compact Flashwith some room to spare. Complete as it was for getting the system upand running, it had somewhat limited support in the way of tools andrun-time libraries. The file system included no development tools.
The version of the libraries on the file system was of an olderversion than that of our other Linux systems. The lack of on-boarddevelopment tools was not a limiting factor, since it was not in theplan to build applications on the SBC; our Linux servers weredesignated for that. Programs would not have to get very big beforebuilding on the target would be time-and space-prohibitive, and a Linuxworkstation provided much better facilities for editing source code.
The build process quickly reduced to building the application on theserver, copying it into place on the target SBC via an NFS mount, andrunning it from the SBC. This was much more convenient than programmingROMs, as was done in the past.
Another difference moving from a no-OS platform to Linux was thatthere were now libraries of code that could be utilized by theapplication. The challenge was how to use these in a resource-effectiveway.
Because the initial set of run-time libraries on the SBC was limitedand of an older version than that on the server, the programs eitherhad to be so simple that they only used common run-time librariesbetween the SBC and the server, or programs had to be staticallylinked.
Statically linking the application
A statically-linked application can run on the SBC regardless of thelibraries present on the file system, since such an applicationcontains all the runtime support it requires. This makes even smallstatically-linked applications huge compared to the same programs,dynamically linked.
There is merit to having little or no run-time support on the filesystem if the system is small with a limited number of applications andtools. In such an arrangement, all applications (including the commandshell and widely used tools such as ls, cp, and mv) would have to bestatically linked, each containing its own copy of the libraries itrequires to run.
Resources are quickly consumed when there are many applications eachwith their own copy of the identical libraries. In contrast, dynamiclibraries save RAM because each statically-linked application loads itsown run-time support into RAM at runtime.
Dynamic libraries are loaded into memory once regardless of thenumber of applications loaded at any given time. Linking to thelibraries on the SBC, rather than the server, though possible, was notconsidered because they lacked complete support for POSIX threads.
Although the robot application was single-threaded, there werealready thoughts of splitting it apart into many threads, soPOSIX-complete libraries were important. Considering all these factors,we decided to make the libraries on the SBC match what we had on ourservers. This was simply a matter of copying the libraries from theserver to the SBC. Since the SBC's runtime environment could easily bea subset of that of the server's, only the libraries that were neededwere copied first.
Mainly using ldd ,a utility that determines the shared-library dependencies of a givenapplication, a good first guess as to the contents of this minimal setwas determined. As various required tools were put onto the filesystem, accompanying run-time library support was also determined andadded, until the support was so complete that it was no longernecessary to check the dependencies.
Libraries, tool chains and buildingblocks
That set of libraries is what resides on the file system today,providing a very complete and convenient environment for development,debugging, and deployment. Applications can be built on the server,dynamically linked, copied to the target, and run. Debugging tools suchas gdb were deployed in exactly the same way; the very same versionthat runs on the server also runs on the target.
Another advantage of this approach is that it was not necessary torebuild the development tool chain in order to build applications forthe target. The tool chain already installed on the server buildsapplications for the target just as easily as it does for itself. Withthis arrangement as a baseline, we could, with reasonable confidence,rebuild the tool chain on this platform or some other, and continuedevelopment from there. The vendor's initial file system layout washelpful from the beginning, as it facilitated quick startup early on,and provided the basis for configuration changes and improvements madesince then. This incremental approach to modifying the layout andcontents of the file system kept the number of variables low, which waskey to maintaining stability of the system. Changing one thing at atime, we developed and maintained a high level of comfort with thesystem as a whole.
Also beneficial in getting the development process started washaving another Linux system readily available for buildingapplications, and as a repository for developing and storing sourcecode. On a more capable target system, the development could be donecompletely natively, editing, compiling, building, and runningearlyapplications all completely on the target. In our case, the targetwas not sufficient to carry all those responsibilities, nor did it haveto be. The server was a base camp, from which many reconnaissancemissions into new territory on the target were launched andsuccessfully carried out.
Having a PCMCIA slot on the SBC meant one thing above all else: ourproduct had the ability to enter the world of wireless networks bysimply inserting a standard 802.11 card. Many embedded systems arephysically stationary. For development, they sit on the workbench, andthe network comes to them in the form of an RJ-45 jack.
Before the Linux port, the only line of communication with the robotwas an RS-232 port and a radio modem to break the physical connection.Breaking the physical tether was an important part of protecting acompany's investment in development equipment: laptop computers inmotion tend to stay in motion, regardless of any sudden stop on thepart of the robot on which they're riding.
Kernels, modules, and devicedrivers
Linux is large compared to a typical RTOS, but it is organized so thatunderstanding it does not require swallowing it whole. Like an onion,Linux can be peeled away in layers and understanding each layer can beaccomplished without the need to dig deeper.
Unless your name frequently comes up in the Linux kernel sourcetree, you're quite likely still several layers up from the core.There's nothing wrong with that; we are nowhere near all the way down.The important thing is that you understand just enough to get your jobdone, and this is possible with Linux.
Linux is widely regarded as a complete operating system, including akernel, a file system layout, a command shell, tools, and sometimeseven a graphical user interface. This may be true, but Linux refersprimarily to the innermost part, the kernel, originally written byLinus Torvalds. The kernel is what boots up the computer, makes it run,and provides the operating-system services that applications need tomake the computer do useful things.
Whether or not one modifies the kernel (as we have done), porting aproduct to Linux at least initially involves the kernel, as it formsthe basis for all other work required to get the system running.Depending upon the hardware vendor, the kernel may already beinstalled, which is what we recommend for a first project.
Linux is developed and released under the GNU General PublicLicense. Its source code is freely available to everyone. Thisopen-source nature of Linux allows anyone to view, modify, apply, andextend it for their own purposes. Chances are, if a person adds toLinux, they want to make their work accessible under the same License,and as a result, Linux gets better.
For each piece of hardware supported by Linux, it is almost certainthat somebody somewhere has written a device driver to make thehardware work with the system. Device drivers are distinct pieces ofsoftware that enable a certain hardware device to respond to awell-defined internal programming interface, hiding how the hardwaredevice works behind that interface.
In such an arrangement, users (and applications) do not have tounderstand how the hardware works in order to use it . If there isspecialized hardware in your system, you will have to decide whether tocontrol that hardware completely in your application, or write a devicedriver to do it. A later section discusses this decision at length.
Whether the goal is a device driver or not, the main (and easiest)approach to building onto the Linux kernel involves no changes to thekernel itself: making a kernel module.
Unlike an application, which performs a specific task from beginningto end, a module attaches to the kernel and registers itself with thekernel in order to serve future requests . Modules can be builtcompletely separately from the kernel source tree. The kernel can loadand unload them easily; there is a well-established mechanism insidethe kernel to do this. Modules are simply a vehicle to dynamically addfunctionality to the kernel.
These two constructs, modules and device drivers, are often related,but are not synonymous. A module may contain a device driver. A devicedriver may be composed of one or more modules. While a device drivermay exist in source code files that form one or more modules, thatdevice driver may also be built right into the kernel (it then becomesa part of the “kernel proper”), eliminating the need to load the moduleafter the system has booted.
Loadable modules provide flexibility and changeability at the costof having additional files on the file system, which must be loadedbefore the device driver can be used. Building the device driver aspart of the kernel eliminates the need to load the module, but requiresa rebuilding of the kernel (and reinstallation of the kernel image)whenthe device driver is modified.
While a device driver (or other addition to the kernel) is indevelopment, it can take the form of loadable modules until it gains alevel of maturity. This can make module and device driver developmenteasier since the kernel source can remain unmodified.
Writing device drivers
Not all of the old robot was to be redesigned or replaced. Aself-imposed requirement was to have the new SBC communicate seamlesslywith several existing peripherals on an SPI bus. New SPI peripheralswould also be added. The problem of controlling this interface easilyreduced to two parts: writing software to control the new processor'sSPI port, and writing software to implement the communication protocolsemployed by the peripherals.
The latter naturally fell into the purview of the application.Protocols do (and did) change. Doing this work in the applicationallowed for support of existing protocols and the ability to add newones easily. The former presented a fundamental decision: should thecontrol of the SPI port occur in the application or in a device driver?
In an embedded system, it's common practice to roll all the coderequired to control the entire system into one piece ofapplication-level software. While this is often the only choice on baremetal, when an operating system forms the basis of the software, thisall-in-one approach rejects many of the benefits of building anapplication on that operating system. Most notably, it ignores theability of the kernel to arbitrate demands for access to systemresources, especially input/output devices such as the SPI port.
Knowing that others had successfully implemented support for thistype of interface completely in application space, it was tempting tocode up control of the SPI port completely within the application.Early on, this is how the hardware was tested and from this we gainedunderstanding of running the SPI port on this new platform.
In the long run, however, controlling the SPI port entirely inapplication space would not meet the requirements. This approach didnot make good use of the operatingsystem benefits. For instance, theoriginal, single-threaded control system was about to becomemultithreaded (the current system spawns over twenty threads, severalof which access the SPI).
Managing the SPI in the application meant having to developelaborate thread-safe software to arbitrate accesses of the interface.In addition, since a Linux-¬based application cannot responddirectly to hardware interrupts, it is resigned to polling,delaying,and looping. The operating system already does this low-level work, andis very efficient at it; all the programmer must do is provide, in adevice driver, the details of how to operate this particular device.
With a device driver, multithreaded, thread-safe, application-levelaccess to the device becomes effortless. Also, accessing hardwarethrough a device driver affords the system designers a convenientseparation of mechanism (working the device) from policy (using thedevice) .
The application doesn't need to know anything about how to operatethe device. The application is insulated by the operating system fromchanges at the level of the device. In theory, the interface, or theway it is operated, could completely change without requiring anychange in (or rebuilding of) the application.
Having seen the benefits of device drivers, the time came to learnhow to implement them. Rubini & Corbet's book Linux Device Drivers(O'Reilly) became required reading, and remains the preferred referencebook.
First attempt at a kernel module
A first attempt at a kernel module containing a device driver for theSPI was completed in a week. In retrospect, this first module was asmall step, essentially moving the application-level, polled SPIsupport into the kernel. Small step or giant leap, the entire robot wasnow beingcontrolled using standard open(), write(), read()andclose()system calls from application space to access the hardware. Theapplication now contained none of the nuts and bolts of handling theinterface.
Though the application now had nothing to do with controlling thehardware of the SPI port, the read()and write()calls were not any moreefficient at runtime than the app-level code they replaced. The SBC'sprocessor provides rich hardware support for SPI.
To use this support to its full potential would require redesigningthe driver to exploit the ability of the processor's internal SPIhardware to interrupt the CPU. This would remove any polling andlooping in the SPI driver code, freeing up more CPU cycles for theapplication, and making SPI communication as execution-efficient as itcould be.
While not especially easy, the task of rewriting the driver to fitan interrupt-driven model was made as easy as it could be, thanks againto Rubini and Corbet. Not only did they lay out all the elementsnecessary to make a design like this work (and work it does),they alsofocused attention on almost all the pitfalls we were to encounter whilewriting the device driver.
This information enabled us to preempt many of the typical problemsaccompanying development in this space, especially those involving raceconditions and resource contention that are so common when fieldingmultiple requests for hardware and common memory.
This experience repeated a message that emerges again and again whenone works with open-source software at any level: support for what youwant to do almost certainly exists. It may take the form of a book, aHowTo, a Google search, a newsgroup or mailing list, a phone call to acolleague, a visit to a local User Group, or inspection of the sourcecode itself. The independent variable becomes your ability andwillingness to reach the information you need.
One of the biggest changes going from the old embedded system to theSBC running Linux was powering down the machine. In the past, turningthe key switch to OFF killed power to the robot and the hardware shutdown properly.
Doing the same to the new robot with Linux running on the SBC wasdangerous. The file system could easily become corrupt because thekernel would not have an opportunity to write cached data to the filesystem. Since the application was no longer running on bare metal, theapplication should no longer directly control the hardware.
The safest approach is to allow the kernel to perform all of itsshutdown tasks, such as unmounting file systems and taking down networkinterfaces, before removing power. In cases where the file system ismounted read-only, it may be safe to simply cut power.
If write caching is disabled on a read-write file system, it can bemade safe, provided the file system is allowed to become quiescentbefore power is removed. Since the application must relinquish allcontrol of the system long before the kernel shuts down, we needed thekernel's cooperation to solve the problem.
The Linux kernel did not disappoint. Within a day, includingresearch (the Linux kernel source, and Rubini & Corbet again), wehad written a 41-line C module that registers with the kernel to benotified upon halt or shutdown. A startup script loads the module. Whennotified by the kernel, the module exercises the I/O line connected tothe relay disconnecting power to the machine only after the kernelfinishes all its shutdown tasks. This provides a clean shutdown of thesystem.
Once the interrupt-driven SPI device driver was implemented and theapplication was using it to communicate with all the peripherals tocontrol the robot, it was time to determine the character of thetraffic on this multiplexed SPI bus. After all, this was a stock Linuxkernel, not a flavor of Linux enhanced for optimum real-timeperformance.
Hard timing requirements could not be imposed as in a typicalembedded system. Had the interrupt-driven model really allowed themultiplexing of the SPI bus as we had envisioned? What did the timingbetween SPI transfers look like? What was the overall throughput of theSPI bus? Was this arrangement going to make it easier for theapplication to accomplish all the communication tasks it had, quicklyenough to satisfy the soft real-time requirements of the robot?
In an embedded system, one does not have to get very far down intothe code before software tools become cumbersome in determining thereal-time performance of the system. Gauging the efficacy of the SPIdevice driver at the level of clock cycles, individual transfers, andinterrupt service latency solely with software tools would be likedetermining a person's facial features using a bowling ball as a probe.
Fortunately, much of the hardware outside the SBC was of our owndesign. The signals of the SPI bus were easily accessible to a logicanalyzer, which handed us measurements on a platter. After adding some”instrumentation” in the form of toggling digital outputs at criticalpoints in the application and the driver, hard evidence showed that theapplication was performing transactions with the peripherals throughthe device driver, and the driver was interleaving those transactionson the SPI bus almost as expected.
Peripherals with a microcontroller of their own required a delaybetween transfers to allow the microcontroller to service theirtransfer-complete interrupt and prepare for the next transfer. Thisdelay is typically 200 microseconds. This magnitude of delay, wereasoned, should be an easy matter for the application, in which theinvolved thread would sleep after initiating a transfer for howeverlong the peripheral needed, freeing up the main CPU for other things,waking up after the delay to begin the next transfer.
We therefore expected to see a transfer with one peripheral, then a200-microsecond delay during which other transfers could occur on theSPI bus, then another transfer with the peripheral.
What we found instead were delays on the order of 10 milliseconds!Not only did this completely go against our intention of dense, burstyutilization of the bus, but it also made transactions which ought totake 2 milliseconds (approximately ten transfers separated by a delayof 200 microseconds) take over 100ms to complete.
While it's true that our utilization of the CPU had greatlyimproved, the throughput of the system was abysmal; at some point, theapplication still had to wait for communication with peripherals tofinish to complete its processing, and this could not happen and meeteven the soft real-time needs.
What was causing this long delay, almost two magnitudes greater thanwe had intended? How could a usleep (200) call(theoretically putting a thread to sleep for 200us) put off the taskfor a whole 10ms?
As with previous device driver work, assistance was waiting in thetypical places where one finds help when working with open-sourcesoftware. In this case, it was in the kernel