Device drivers in user space
Here are the benefits and some caveats to running data-path applications in the user space. Discussed is Linux's UIO framework.
Editor's Note: this article was first published in the International Journal of Information and Education Technology
Traditionally, packet-processing or data-path applications in Linux have run in the kernel space due to the infrastructure provided by the Linux network stack. Frameworks such as netdevice drivers and netfilters have provided means for applications to directly hook into the packet-processing path within the kernel.
However, a shift toward running data-path applications in the user-space context is now occurring. The Linux user space provides several advantages for applications, including more robust and flexible process management, standardized system-call interface, simpler resource management, a large number of libraries for XML, and regular expression parsing, among others. It also makes applications more straightforward to debug by providing memory isolation and independent restart. At the same time, while kernel-space applications need to conform to General Public License guidelines, user-space applications are not bound by such restrictions.
User-space data-path processing comes with its own overheads. Since the network drivers run in kernel context and use kernel-space memory for packet storage, there is an overhead of copying the packet data from user-space to kernel-space memory and vice versa. Also, user/kernel-mode transitions usually impose a considerable performance overhead, thereby violates the low-latency and high-throughput requirements of data-path applications.
In the rest of this article, we shall explore an alternative approach to reduce these overheads for user-space data-path applications.
Mapping memory to user space
As an alternative to the traditional I/O model, the Linux kernel provides a user-space application with means to directly map the memory available to kernel to a user-space address range. In the context of device drivers, this can provide user-space applications direct access to the device memory, which includes register configuration and I/O descriptors. All accesses by the application to the assigned address range ends up directly accessing the device memory.
Several Linux system calls allow this kind of memory mapping, the simplest being the
mmap() call. The
mmap() call allows the user application to map a physical device address range one page at a time or a contiguous range of physical memory in multiples of page size.
Other Linux system calls for mapping memory include
vmsplice(), which allows an arbitrary kernel buffer to be read or written to from user space, while
tee() allows a copy between two kernel-space buffers without access from user space.
The task of mapping between the physical memory to the user-space memory is typically done using translation lookaside buffers or TLB. The number of TLB entries in a given processor is typically limited and are thus used as a cache by the Linux kernel. The size of the memory region mapped by each entry is typically restricted to the minimum page size supported by the processor, which is 4 kilobytes.
Linux maps the kernel memory using a small set of TLB entries that are fixed during initialization time. For user-space applications however, the number of TLB entries are limited and each TLB miss can result in a performance hit. To avoid such penalties, Linux provides concept of a Huge-TLB, which allows user-space applications to map pages larger than the default minimum page size of 4KB. This mapping can be used not only for application data but text segment as well.
Several efficient mechanisms have been developed in Linux to support zero-copy mechanisms between user space and kernel space based on memory mapping and other techniques. These can be used by the data-path applications while continuing to leverage the existing kernel-space network-driver implementation. However, these mechanisms still consume the precious CPU cycles and per-packet-processing cost still remains moderately higher. Having a direct access to the hardware from the user space can eliminate the need for any mechanism to transfer packets back and forth between user space and kernel space, thus reducing the per-packet-processing cost.
Linux provides a standard UIO (User I/O) framework for developing user-space-based device drivers. The UIO framework defines a small kernel-space component that performs two key tasks:
a. Indicate device memory regions to user space.
b. Register for device interrupts and provide interrupt indication to user space.
The kernel-space UIO component then exposes the device via a set of sysfs entries like
/dev/uioXX. The user-space component searches for these entries, reads the device address ranges and maps them to user space memory.
The user-space component can perform all device-management tasks including I/O from the device. For interrupts however, it needs to perform a blocking
read() on the device entry, which results in the kernel component putting the user-space application to sleep and waking it up once an interrupt is received.
User-space network drivers
The memory required by a network device driver can be of three types:
a. Configuration space: this refers to the common configuration registers of the device.
b. I/O descriptor space: this refers to the descriptors used by the device to access data from the device.
c. I/O data space: this refers to the actual I/O data accessed from the device.
Taking the case of a typical Ethernet device, the above can refer to the common device configuration (including MAC configuration), buffer-descriptor rings, and packet data buffers.
In case of kernel-space network drivers, all three regions are mapped to kernel space, and any access to these from the user space is typically abstracted out via either
ioctl() calls or
write() calls, from where a copy of the data is provided to the user-space application.
Click on image to enlarge.
Click on image to enlarge.
User-space network drivers, on the other hand, map all three regions directly to user-space memory. This allows the user-space application to directly drive the buffer descriptor rings from user space. Data buffers can be managed and accessed directly by the application without overhead of a copy.
Taking the specific example of an implementation of a user-space network driver for eTSEC Ethernet controller on a Freescale QorIQ P1020 platform, the configuration space is a single region of 4k size, which is page-boundary aligned. This contains all the device-specific registers including controller settings, MAC settings, and interrupts. Besides this, the MDIO region also needs to be mapped to allow configuration of the Ethernet Phy devices. The eTSEC provides for up to eight different individual buffer descriptor rings, each of which are mapped onto a separate memory region, to allow for simultaneous access by multiple applications. The data buffers referenced by the descriptor rings are allocated from a single contagious memory block, which is allocated and mapped to user space during initialization time.