This tutorial presents the author's practical experience with writingLinux device drivers to control custom-designed hardware. The tutorialstarts by providing an overview of the driver writing process, anddescribes several example drivers provided with this tutorial . Thereader is encouraged to experiment with those example drivers on theirown x86 system, as it provides the best learning experience.
The ability of a user-space process to transfer data from multiplePCI boards is contingent on the implementation of both the hardware anddriver. The requirements of both the hardware and software arepresented.
The drivers in this tutorial are written for the Linux 2.6 kernel.The drivers have been built against; 2.6.9-11 (Centos 4.1), 2.6.13, and2.6.14 for x86 and PowerPC targets. Details that are clearly describedin the book 'Linux Device Drivers' , by Corbet, Rubini, andKroah-Hartman are not repeated in this tutorial, so the reader isencouraged to obtain a copy.
The Linux 2.6 kernel presents a number of generalized interfaces thatthe driver writer must first understand, and then implement for theirspecific driver. The best way to understand the interfaces is to writesimple drivers that exercise a subset of the kernel driver interfaces.The following sections describe the interfaces used to implementcharacter device drivers.
The file simple_module.c implements a very basic kernel module. Adevice driver is a kernel module, but kernel modules are also used toadd features to the kernel that have nothing to do with device drivers.Welcome to your first generalized kernel interface.
The basic requirements of a kernel module are that they implement aninitialization and an exit function. Those two functions are identifiedby the macros module_init()and module_exit(). The example also showshow to pass load-time parameters to the module, and how to setuplogging in a module.
The code sets up two logging macros; LOG_ERROR()and LOG_DEBUG(). Thedebug macro can be removed from the code at compile time (by notdefining DEBUG), or can be compiled into the code and then enabled ordisabled via the load-time parameter simple_debug. This method ofadding log messages to code is easier to maintain (eg. disable) than aseries of printk()calls littered throughout the code.
The following shows the driver usage; the // marks are comments, while the $(user) and #(root) prompts show the commandsyou enter (bashshell syntax ).
So with the load-time parameter simple_debugset to zero, theLOG_DEBUG()message does not appear in the output. The module load andunload messages are generated using the LOG_ERROR()macro so that theyare always generated.
The file simple_driver.c implements a simple device driver. What makesit a device driver, and not just a kernel module? In simple_init thedriver requests a range of major and minor numbers (the numbers used torepresent device nodes in /dev), it then allocates memory for an arrayof device-specific simple_device_tstructures, and then registers thecharacter device, cdev, member of each structure in the array with thekernel.
Registration of the character device requires a set of fileoperations, i.e., a kernel-level implementation of the functions thatget called when user-space calls system calls, eg. open(), read(),write(), ioctl(), lseek(), select(),and mmap(). The file operations arestored as function-pointers in a structfile_operations; if this codewas written in C++, then this structure would be the base-class, andyour implementation of its functions would be an inherited class.
The file simple_driver_test.cis a user-space application that teststhe functions of the driver. Install the module, type ls/dev/simple*andonce you see device nodes there, run the test. After the test finishes,type dmesg to see the kernel-level messages triggered by the user-spacetest. Remove the driver, and reinstall it with load-time parameters,eg.
This creates three devices each responsible for three minor numbers(functions on the device). ls-al/dev/simple* will show the multipledevices created (and their major/minor numbers).
How did the device nodes magically appear in /dev?i Thats next.
Hotplug, sysfs, and udev
The simple driver initialization code, simple_init, also performsanother step, it creates a kernel object, class_simple or classdepending on the kernel version, that creates entries in the sys-filesystem, sysfs, in the directory /sys/class. Creation of the classobject in the initialization code, creates the entry/sys/class/simple_driver. Devices managed by the driver are then addedto the class object (see the code), creating the device nodes under/sys/class/simple_driver, eg. if no load-time parameters are specified,the driver creates one device, and the node/sys/class/simple_driver/simple_a0 is created.
Why create these class and device 'objects'? The Linux 2.6 kernelsupports the concept of hot-pluggable devices, i.e., devices that canbe plugged in while the system is turned on, eg. a USB camera. In olderLinux systems, if you plugged in a camera, you'd have to look at theoutput of dmesg to see what the camera was detected as (if at all), andthen try and figure out how to get images off the camera!
The Linux 2.6 system generates 'hot-plug' events every time a kernelobject is created and destroyed, and these hotplug events trigger theexecution of scripts in user-space. The (appropriately written) scriptsthen automatically populate the /deventries for a device. A nicefeature of these scripts is that you can decide what name to give thedevice, eg., a camera detected as a USB mass-storage device might bedetected as /dev/sda1in a non-hotplug system, but with hotplug you cansetup the camera name to be /dev/camera, much nicer!
The automatic creation of /dev entries relies on three relatedkernel infrastructures; hotplug, sysfs,and udev. The man page, manudev,gives details on how the scripts can be setup to create the /deventries with specific permissions, and how to map a kernel name (eg.that used when the device was added to the class object in simple_init)to a user-space defined name.
On Centos 4.1, the udevconfiguration files are kept in /etc/udev/,the line udev_log=noin in /etc/udev/udev.conf can be changed toudev_log=yes and hotplug events will be written to the system log. Forexample, as root type tail-f/var/log/messages, and then from anotherterminal install the simple_driver.ko, and you will see the logging ofthe hotplug events.
The default name given to a single device created by the simpledriver is /dev/simple_a0. With no udevscripts in-place, the device nodeis created for use by root only, and is named identically to the stringused in simple_init. The permissions on the device node can be changedby creating a udevscript containing a single line:
This changes the permission on all nodes matching the patternsimple_*to the owner dwh,group mm, with permissions 0660. The name ofthe device entry can be changed, or a symbolic link to a device entrycan be created, by adding another script, eg. the following creates asymbolic link to the first device entry
The udevman page gives more details on the options for device naming(eg. a user-supplied program can be run to generate the device name).The automatic creation of /dev entries helps reduce the contents of/dev to just those devices installed. It also provides flexibility touser-space in the naming of device nodes.
For example, in the case of PCI devices it allows the PCI location,eg. bus:dev.fn to be remapped into a meaningful slot number, eg.instead of say a device named /dev/board_00:0c.0, the user-space namecan be mapped to /dev/ board2.
The class_simple interface, as described in the Linux Device Driversbook , was removed from the kernel (according to the ChangeLog forthat kernel), and the API changed again slightly. The parallel portuser-space driver, ppdev.c, is a nice small (easily understandable)driver that uses the class interface. A diff of different kernelversions of this driver can be used to determine the usage of any APIchanges (eg. whether a new argument can be assigned NULL).
The driver simple_timer.c implements a single device that uses twodifferent kernel mechanisms for delaying the calls read(), write(),andselect(). The test program simple_timer_test.c tests the driver. Thedriver demonstrates the usage of timers and events.
The driver simple_irq.c implements a single device that uses theparallel port on an x86 PC. To test this driver, you might need tofirst remove the printer driver and parallel port driver, i.e.,modprobe-rlp, modprobe-rparport_pc. The driver creates a kernel timerthat fires every second.
The timer handler writes a low and then high to all the data lineson the parallel port. If a data line, one of pins 2 through 9, isjumpered to the interrupt line, pin 10, then an IRQ will be generatedevery second. The IRQ handler unblocks a blocked read(), write(),orselect().
If a data line is not jumpered to the IRQ line, then the blockedcalls will timeout (2s) and continue anyway. The test programsimple_irq_test.c tests the driver. The driver demonstrates the usageof timers, IRQs, and events with timeouts.
The driver simple_buffer.c implements a single device that also usesthe parallel port on an x86 PC (so you will need to remove simple_irqto test it). This driver is similar to simple_irq.c with the changethat IRQs write a time-stamp to an internal buffer, user-spacewrite()writes to that buffer, and read()reads from the buffer. Thefollowing are some tests that can be performed using standardcommand-line tools:
1) Connect the parallel portIRQ to a data line. Install the driver named insmodsimple_buffer.ko.Once the /dev/simplenode is valid, type cat/dev/simple. A UTC timestampwill be printed every second.
2) Remove the parallel portjumper. Remove the driver. Install the driver and disable the timer andtimeout as follows:
On one terminal type “cat/dev/simple”, on another type echo”Hello”>/dev/simple”. (You can also leave the timer enabled and itwill just write messages to the log file).
3) Combine the first twotests (remove and re-install the driver without any load-timeparameters); the IRQ will add a complete timestamp message everysecond, while write will add a complete string (whenever the usertriggers a write). No messages will be interrupted, since eachprocedure locks the internal buffer.
The test shows that the driver works as one would expect, however,take a look at the source for the details. The internal buffer is aresource that is shared between read() (eg. one process), write() (eg.another process), and the IRQ handler (interrupt context).
The driver uses a spin-lock to protect access to the buffer (and itsassociated buffer count and pointers). Without this protection, an IRQcould interrupt a write, and insert a timestamp into the middle of thestring echoed into the driver. Of course in a real driver, the resultscould be more disastrous.
If the resource (buffer) being protected by the driver was only everaccessed by processes, then a semaphore can be used to protect it.Semaphores can be used to block a process, causing it to sleep whilewaiting for a resource. Spin-locks are not quite so forgiving.
You are not allowed to sleep, or call a function that might sleep,while holding a spin-lock. Make sure to build your driver developmentkernel with CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEPenabled, and the kernel will give you a nice reminder if you try to dosomething bad (eg. calling kmalloc while holding a lock).
The write() and read() operations of the driver need to copy datafrom (or to) user-space to (or from) a kernel buffer. However, acopy_from/to_user can sleep, so there is no way to copy directly to thespin-lock protected buffer!
There's also the following write sequencing issue; to write datainto the buffer, you first need to check whether there is space.However, the spin-lock needs to be held to check the buffer state, soideally you would hold the lock, check for space, release the lock, andthen copy a matching amount of user-data to the kernel. But, since youare not holding the lock, an IRQ can come along and use up your space!
The solution, shown in the driver code, is to first copy all theuser data into a kernel buffer, and then hold the lock while checkingfor space. This allows the (sleepable) copy and allocation calls to beperformed before holding the lock. Of course in the case of a fullbuffer and non-blocking write, the allocation and copy from user-spacewas a waste of time.
The code that holds the spin-lock, checks for a condition, and thengoes to sleep on a wait-queue if the condition is not met, should lookeerily familiar to anyone who has programmed with Pthreads; it is thesame pattern of code as used with a mutex and condition variable.
A mutex is used to protect a resource, while a condition variable isused to put a thread to sleep while waiting for some other thread tosignal it that the condition has changed. The nice thing about thisanalogy is that you can write pthreads code to simulate driverbuffering operations to 'figure it out' outside of the kernel.
The buffering used in the simple buffer driver is a bit contrived inthat there are two 'producers' writing to the buffer, and one'consumer'. A more likely scenario for a driver would be to have abuffer contended for by a single producer (say the receive IRQ), and asingle consumer (say read), and another separate buffer for a singleproducer (write) and consumer (transmit IRQ).
But even in this situation, you can run into problems if the readfrom the buffer takes an excessive amount of time, blocking new datafrom the receive IRQ. One solution to this issue is to use two buffersfor each producer-consumer pair; eg. the receive IRQ is initialized topoint to an empty buffer, and receive IRQs fill the buffer until a readis issued, at that point IRQ buffer is passed to read, and the IRQ getsthe second empty buffer.
Once read has consumed the contents of the first buffer, if thesecond buffer in-use by the IRQ has new data, then the buffers areswapped again. In this scheme, the lock only needs to be held to swapthe buffers, and since read does not hold the lock once it has a validbuffer, a copy to user-space from the kernel buffer is allowed,removing the need to use an intermediate buffer as shown in the simplebuffer driver. The kernel tty layer uses this form of buffering schemeand refers to it as flip-buffering (see linux/tty.h).
The simple buffer driver has (at least) two practical applications.If you install it and “cat” the timer generated time stamps into afile, a plot of the difierence between consecutive time stamps minus 1second, will show the error in the kernel's ability to generate a 1second delay.
Running some tests br>In a test on an HP Omnibook 6100 PIII 1GHz laptop, the error wasapproximately -130µs (i.e., slightly less than 1 second). Thetest was started on a 1 second boundary, and over the space of 10minutes, the timer was firing 100ms earlier than a 1 second boundary.The second test determines how good NTP operates. Install the driverwith the timer and timeout disabled. Connect up the 1pps tick from yourNTP server's GPS unit to the parallel port interrupt of your PC, makesure your PC NTP daemon is running, and catthe IRQ generatedtimestamps.
The observed error of the measured timestamp relative to that sametimestamp rounded to the nearest second was about ±0.5ms. If thetest PC (laptop) had its ethernet cable disconnected, or the NTP daemonwas stopped, the error of the logged timestamps relative to the GPS1pps tick would gradually increase (100 to 200µs over 10minutes). If you had a method of generating a higher-frequencysquare-wave that was also locked to GPS, then you could determine theinterrupt latency, and interrupt handling overhead, of the kernel byhammering the IRQ pin at a few kilohertz.
A 'real-world' PCI driver
The experience presented in this document was gained during thedevelopment of the Caltech-OVRO Broadband Reconfigurable Array (COBRA)Correlator System. The hardware developed is documented at www.ovro.caltech.edu/~dwh/correlator.
The hardware is currently in use on several radio astronomyprojects, eg. the SZ Array (http://astro.uchicago.edu/sza/)and the CARMA array (http://www.mmarray.org).The cPCI digitizer and correlator boards used in the correlator systemcontain a PLX9054 PCI interface, a Texas Instruments DSP, AlteraFLEX10K FPGAs, and on the digitizer, 1GHz analog-to-digital converters.
The digitizer output routes to the FPGAs on the digitizer board,where data is digitally filtered, delayed, and routed to front-panelhigh-speed connectors. The data travels over LVDS cabling (Ultra-SCSIcables) to the correlator boards, where FPGAs cross-correlate andaverage the data.
The on-board DSPs retrieve auto-and cross-correlation results fromthe FPGAs, perform FFTs, further corrections, and average the data for100ms to 500ms. Data is then transferred to a Linux host.
The system uses a GPS based NTP server with a 1pps output. The 1ppssignal is used to derive a hardware heartbeat, so that the 100ms and500ms transfers are aligned with real-time. The Linux hosts run NTPpointing to the NTP server, and check that data from boards arriveswithin a 50ms window relative to a 100ms or 500ms boundary.
The Linux driver used in the COBRA system is shown graphically in Figure 1, below . The driverimplements several character device interfaces to the board; aterminal-like interface with standard-input, output, and error, aread/write control interface, a read-only data interface, and aread-only monitoring interface.
The reason for using multiple devices, rather than a complex schemeof I/O control was determined by the usage of the driver. For example,one objective was to enable the use of standard command line tools likecat, od(octal dump), echo,and dd. These tools know nothing of I/Ocontrol calls, so need to be directed to a device node of a specific'personality'.
|Figure1: COBRA device driver block diagram. The block diagram shows therelationship between the /devnodes accessed by user-space applicationsand the files that implement the driver.|
The COBRA control system code controls up to 20 boards in a singlesub-system, and data must be collected from each board at about thesame time. The standard method for dealing with multiple sources ofdata is to use the select() call, which uses file-descriptors. So byseparating out the data device and monitor device functionality at thedriver-level, a user-space server can run a thread containing aselect() call that collects all the data from all boards, and servesthat data up to clients. Then another thread, or another process even,can run a monitor server containing a thread calling select() on allthe monitor file descriptors.
Dr. David Hawkins, SeniorScientist at the California Instituteof Technology, is currently involved with the design anddevelopment of high-speed digital correlator systems for Caltech, U.Chicago, and the CARMA (Caltech,Berkeley, U. Illonois, and U. Maryland)radio observatories.
Thisarticle is excerpted from a paper of the same name presented at theEmbedded Systems Conference Silicon Valley 2006. Used with permissionof the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/sv.
 J. Corbet, A. Rubini, and G. Kroah-Hartman. LinuxDeviceDrivers.O'Reilly, 3nd edition, 2005.
 D. Hawkins. COBRA device driver. Caltech-OVRO documentation, 2004. (www.ovro.caltech.edu/fidwh/correlator/pdf/cobradriver.pdf).
 D. Hawkins. PLX-9054 PCI Performance Tests. Caltech-OVROdocumentation, 2004. (www.ovro.caltech.edu/fidwh/correlator/pdf/pciperformance.pdf).
 D. Hawkins. Linux driver design source code. Caltech-OVROdocumentation, 2005. (www.ovro.caltech.edu/fidwh/correlator/software/driverdesign.tar.gz).