CMP EMBEDDED.COM

Login | Register     Welcome Guest   IPS  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS




Implementing Network Protocols and Drivers with Streams

This article examines Streams and the framework's usefulness in implementing and porting multiple networking protocols and drivers for real-time embedded applications.

Several of the popular real-time operating system vendors now offer Streams as an option. Streams is a framework for implementation of high-performance networking protocols and device drivers. A solution for integrating multiple networking protocols, Streams has been a part of traditional computer operating systems for years and is now available as an option from several embedded RTOS vendors. One might wonder why people are putting a seemingly complex "big government" mechanism into efficient and small RTOSs. I will attempt to answer this question with specifics about performance and overhead associated with Streams.

Most of my experience has been in one form or another of electronic imaging. I can think of many examples in an historical context that will illustrate how Streams has crept into our consciousness to become part of the OS requirements for sophisticated embedded applications. Over the years, OS requirements for embedded systems have changed. At first the OSs were asked to support very specialized applications, such as gathering data from a single specific scanner and presenting it in summary form on a single screen, or taking control inputs from a single set of switches. Now, embedded equipment must present the data in real time over a communication network to PCs or other equipment for analysis. Often embedded applications are in equipment that must have a presence on a heterogeneous network. These embedded applications are in equipment such as gateways, routers, or sophisticated office equipment (document printers and multi-functional print/scanning/faxing devices). In these applications, the protocols must run simultaneously with application software that has hard real-time requirements for device control and interrupt latency.

In this article, I will examine Streams and its usefulness in implementing and porting multiple networking protocols and drivers for real-time embedded applications. Viewing Streams in relationship to real-time and multi-threaded operating systems is important because Streams was originally designed for Unix, which is not a real-time system. Therefore, we should be aware of issues specific to embedded RTOSs when developing or porting Streams modules and device drivers. Also, when working with Streams in real-time embedded systems, there are several important techniques for multiplexing, flow control, interrupt handling, and avoiding dead-lock and race conditions.

What is Streams? Why use it?

Streams has its origin in the Unix OS. Originally, the device driver mechanism in Unix provided only the standard user-level driver entry points and a way to register an interrupt handler with the OS. The traditional Unix device driver was called a character driver and was intended to handle a flow of individual characters from a serial device, such as an ASCII terminal. The traditional driver didn't provide mechanisms for buffer management or flow control, which are almost universally required in device drivers and especially in networking protocols. Only a relatively small part of a device driver is hardware specific and specific to a given application. Every programmer had to provide his/her own redundant buffer and queue management capability. Writers of device drivers could borrow boiler plate code from each other to avoid redundant effort, but it was clear that a standard mechanism would be beneficial. Unix OSs had block drivers that would extend the character driver concept by providing buffer management for disks and other devices with file systems, but these are not well suited for full-duplex communications I/O. A standard common facility for communications device drivers and protocols was needed. Streams was developed to fit this need.

Streams can be beneficial because it provides a standard framework to support implementation of services for high-performance networks. By providing a common mechanism for buffering and I/O processing, it reduces duplicated code and buffer space. This often allows the reuse of drivers or modules or simplifies the writing of new ones. Also, Streams encourages reuse because it provides a standard underlying framework for communications protocols. Just about every existing networking protocol or device driver was supported in Unix at one time another, so the porting of existing modules or device drivers to a real-time embedded application becomes easier. In my experience, this situation had often allowed the purchase of sources for the communications protocols we needed, and we had a relatively small porting effort to bring those protocols up in our real-time embedded application.

Streams provides a full-duplex data path between device driver and application. It supports simultaneous control and data messages with a mechanism for prioritizing messages. Streams also provides a mechanism to layer or "push" modules on top of each other at run time. This provision makes it an excellent choice for implementation of networking protocols that have a layered architecture, such as those that follow the OSI architecture. Streams supports multiplexing, which is also a common requirement for communication protocols. A single networking protocol can use the Streams multiplexing capability to run simultaneously on two different physical layers. For example, we were able to make TCP/IP run simultaneously on two physical networks by having DLPI (Data Link Provider Interface, a specification for what occurs between the data link and network layers) compliant drivers for both Ethernet and Token Ring. Streams also allows the sharing of data buffers between two or more messages. This allowance reduces overhead when running multiplexed networking protocols by allowing two protocols to process the same message block simultaneously, thereby enhancing performance by reducing the copying of data.

History of Streams

Streams was originally developed by Dennis Ritchie at Bell Labs in 1983. It was released as part of AT&T SVR3 (System Five release 3) Unix in the mid '80s. Streams was augmented with several new features as part of SVR4 (System Five release 4) in the late '80s. It has been available as a feature in one real-time OS, LynxOS, since approximately 1993. Streams is now available from two other major RTOS vendors: ISI and Wind River.

Implementation in real-time operating systems

Streams is best suited for operating systems that utilize the microprocessor's MMU with separate user and kernel context. It is designed to hide the message passing mechanism from the application program, allowing application programs to be backwards compatible with those using a simpler I/O mechanism. Streams was originally designed for Unix and provides an API compatible with the familiar Unix I/O interface consisting of the routines open(), close(), read(), write(), and ioctl(). An application that is reading from an I/O stream does not need to know whether the underlying implementation is Streams unless it is taking advantage of the additional user-level functions, such as putmsg() or getmsg() provided by Streams.

Streams contains a scheduler to sequence the servicing of the module and driver queues. In an RTOS, the scheduler is implemented as one or more separate kernel threads. The OS provides semaphore locking for the Streams message queues because they are accessed by both the scheduler thread and other threads in the drivers or modules. In many OSs, possibly for efficiency reasons, the queue protection is not coded into putq(), the procedure provided to place messages in a queue.

Typical protocol stacks implemented with Streams

Typical network communication protocols are organized in layers according to the OSI 7-layer model. Streams was designed to facilitate the implementation of these protocols. The seven layers of the OSI model, starting from the bottom, are:

  • 1. Physical
  • 2. Data Link
  • 3. Network
  • 4. Transport
  • 5. Session
  • 6. Presentation
  • 7. Application

Protocol suites can contain multiple transports or complex multiplexed network layers, but typical common networking protocol suites usually only will require that Layers 1 through 3 or 4 be implemented in the kernel. Streams is well suited to implement these protocols because of its ability to place or "push" modules on top of each other. Each separate module services the same data stream, adding value as the data travels through the pipe. For example, for incoming data in a networking protocol, each layer may strip data from the beginning or end of an input packet. In the case of outgoing data, each module will add data to the packet.

Because the TCP/IP protocol suite is usually bundled in with the OSs provided by real-time vendors, and because it is common and relatively well understood, a Streams implementation of TCP/IP will serve as a good example here. TCP/IP's layered organization are as follows:

  • IP: Network layer
  • UDP: Transport layer
  • TCP: Transport layer
  • 802.3 LLC: Data link layer
  • Ethernet device driver: Physical layer

Figure 1 illustrates how Streams multiplexing can be used to organize the TCP/IP protocol where there is both upper and lower multiplexing. The LLC module is a 1-to-n multiplexer and the IP module is an n-to-1 multiplexer. Also, as illustrated, TCP/IP contains two transport layers: UDP, for connectionless communication, and TCP, which provides a connection-oriented transport.

Declarations for a multiplexing module

Using a typical multiplexing module as an example, I will illustrate how the data structures should be declared. The declarations for a multiplexing module are fairly straightforward. The streamtab structure uniquely defines a module or driver, and the qinit structures have the procedures and information used to initialize each queue. In this particular module, all four queues are declared in the streamtab structure: the upper read side, upper write side, lower read side, and lower write side queues.

Streamtab is the only structure that needs to be global. The other routines are only called indirectly by accessing their pointers through the queue_t structures to be described later. For accuracy, I should mention that the lower multiplexing actually requires a data structure to keep track of the queue in the module or driver linked below the multiplexing driver.

The fact that all four queues are used in this multiplexing module indicates that the module is intended to support lower multiplexing. Lower multiplexing is done by linking the multiplexing module to the module or driver below. Upper multiplexing is done by pushing multiple modules on top of a single module; it requires only the upper two queues be maintained. Of course, it is the responsibility of the module to keep track of the message routing because this routing tends to be specific to each networking protocol.

Streams messages and queues

In addition to the streamtab structure shown above, the message and queue data structures discussed are the core of Streams. An understanding of these two data structures is essential for a thorough understanding of how Streams drivers and modules behave.

The Streams message block contains the list of packets waiting for servicing. As shown in Figure 2, the Streams message structure consists of a message block, a data block, and the actual data buffer. The message block contains a pointer to the data block b_datap, as well as read and write pointers b_rptr and b_wptr, which keep track of the positions in the data buffer. The b_cont pointer can be used to chain message blocks for multi-part messages.

The data block contains the pointer to the base of the data buffer db_base and a pointer to the end of the data buffer db_limit. The data block also contains the reference count db_ref, which is the number of message blocks pointed to by this data block. This information is used in the sharing of data between two messages. The reference count is incremented each time a message is duplicated by calling dupmsg() or dupb() to cause a new reference to the data block. It is decremented when freemsg() or freeb() is called to free a message that points to the data block.

The data buffers contain the actual data and are available in several standard sizes ranging from four bytes to 4K bytes. The buffers are allocated by the allocb() call and can be assigned one of three priorities. A means of recovering from buffer allocation failure also exists. A callback can be set by passing a function pointer in buffcall(). The function will get control when allocb() fails so it can attempt reallocation.

The Streams queue structure actually stitches the modules and drivers together. As can be guessed from the module declarations above, a module or driver contains two to four queues. A simple driver contains two queues (one for the read side and one for the write side), while a module capable of upper and lower multiplexing would contain four queues. The list of queues are built when modules are pushed, and the list is dismantled when modules are popped. The Streams scheduler traverses this list, looking for queues with non-empty message lists. When found, the scheduler executes the associated service routine. From the queue structure, Streams can find all the information relevant for maintaining and servicing the associated module or driver.

The queue contains the head pointer to the list of message blocks awaiting servicing and it also contains limits and information about the queue. The queue contains the limits for maintaining the queue's flow control q_hiwat and q_lowat, as well as the minimum and maximum packet size q_minpsz and q_maxpsz, for this particular queue. The queue state flags in q_flag indicate whether the queue is full and whether it is enabled for scheduling, and the q_count contains the count of characters on the message list.

The Streams queue also has a pointer to the q_info structure containing the data used to initialize the queue and pointers to the module or driver procedures. Streams stuffs q_info with pointers to the procedures obtained from the streamtab structure described above in the declarations. Generally, the developer places a pointer to the driver's or module's "private data structure" in q_ptr as shown in the figure below. The field q_ptr is referenced from the queue pointer (typed queue_t*) passed to each of the put and service procedures, as shown in the declarations above. Therefore, any globally useful private data can be accessed from all of the service and put procedures without needing a lot of trouble-prone global data in each driver or module. Also, the Streams queues are protected by a mutex to prevent the queue service routines from accessing queues as they are being updated from elsewhere in this module or even in other modules. The mutex may be implemented within Streams or the OS vendor may leave this important detail to the driver or module writer.

Flow control

As discussed above, Streams provides a scheduling mechanism consisting of a list of message queues, each of which has high and low limits for the number of bytes in the queue. Each of these queues has a service procedure that gets scheduled for execution whenever the queue contains data. The low and high water marks in the queue are used to maintain flow control.

The mechanism for flow control is straightforward. A queue is marked FULL when its count exceeds the high-water mark; the service procedures in queues behind the current queue can call canput() to see if the queue's FULL flag is set. They can use putbackq() to "keep" the messages until the queue reaches the low water mark and the FULL flag is unset.

Implementing streams modules and drivers in multi-threaded and real-time systems

Design for efficiency while maintaining awareness of concurrency issues is important. Some common techniques allow you to make maximum use of the OSs facilities for efficiency, but some potential pitfalls for real-time implementations exist as well. Streams modules and drivers will usually have multiple threads, even within a single module. For example, a module will contain the main thread, invoked when the application makes a system call to the driver entry points such as ioctl(), open(), close(), getmsg(), or putmsg(). Also, the implementer must remember that the Streams scheduler as implemented in the OS is a separate thread. This distinction is necessary because the scheduler must run asynchronously to schedule the service procedures for the queues whenever they have data available. In addition to these two threads, each module may have one or more timer threads. Also a driver may need an interrupt thread that awakened when the hard interrupt service routine signals, indicating an incoming packet. Because of the inherently multi-threaded nature of Streams, it is advisable to pay careful attention to concurrency issues.

When designing a driver, kernel threads should be used for most of the interrupt processing. Interrupt threads are commonly used in real-time operating systems because hard interrupt routines can't be preempted, and allowing priorities to be monitored and managed by the OS mechanism is more efficient in terms of CPU utilization. If your RTOS offers this capability, it is always best to do most of the processing in an interrupt thread. The driver should be written so the hard interrupt handler does very little; it should only turn off interrupts, grab copies of hardware registers, and trigger the semaphore to wake up an interrupt thread, which does most of the work to complete the interrupt processing.

Minimize the copying of data

Copying the data in a driver as few times as possible is always best. Streams facilitates this economy because DMA or direct I/O can be done between the Streams buffer in memory and the hardware. This arrangement ensures that data is copied by software only once-when it is moved between the Streams buffer and the applications buffer during the processing of the read() or write() system call. Generally, all the message processing is done by moving pointers to the message blocks. A routine to copy messages, copymsg(), is provided in the Streams API bug. Streams itself never copies the data. Usually, the only time data is copied is when the protocol requires forming a return or acknowledgment message containing the same data as the received packet. Sometimes when there is a need to share data between two multiplexed modules, it will be necessary to copy the data. However, if the data is not going to be destroyed or altered by either of the modules, dupmsg() can establish a new message descriptor pointing to the same list of data blocks. In short, copying data is expensive in a real-time system, and Streams provides mechanisms to minimize this overhead.

Using flow control can help real-time performance

Drivers and modules should make maximum use of the Streams flow control mechanism. For example, a driver may receive an interrupt when a packet is in the ring buffer of the interface card. The driver then needs to form this packet into a Streams message ready to pass to the read side of the upstream module. If the Streams mechanisms are used to advantage, flow control is used to avoid bottlenecks in message processing. The driver is implemented with an upper read service routine and putq() is used to place the message on its own queue. The service routine will be called when the high-water mark is reached. Typically, the upper read service routine is implemented as follows (and a similar technique can be used on the write side):

static int mursrv(queue_t *q)

{

mblk_t *mp;

while (canput(q->q_next)) {

mp = getq(q);

putnext(q, mp);

}

}

This simple mechanism can improve reliability and performance with fewer lost and dropped packets during peak useage.

Kernel threads are used to supplement the timers necessary for resending and connection management. Network protocols have what's known as a "reliable connection mode" that requires that packets be sequenced so that any missing packets can be resent. OSs provide timer mechanisms for this kind of purpose. Most RTOS vendors implement the timeout facility by calling the timeout functions directly from the timer's hard interrupt service routines, and they may neglect to document this important detail. This timer interrupt level processing can steal CPU cycles from other kernel and user functions because they cannot be scheduled or prioritized, so it is best to do as much time-dependent processing as possible in a separate kernel thread. This timer thread sleeps on a semaphore waiting to be awakened by a timer interrupt. The typical timeout (interrupt level timer function) can be implemented very simply as follows. Substitute your OS vendor's calls below:

void mytimer()

{

ssignal(timer_sem);

}

Therefore most of the complexity necessary to implement the protocol is in a separate thread, which is implemented as follows:

int my_time_thread()

{

swait(timer_sem);

/*...processing...*/

}

Avoiding race conditions and deadlocks

Concurrency is inherent in most applications using multi-threaded RTOSs, and each device driver implementation should be looked at as if it were running on a multi-processor system. Of course, you should always take care to make sure critical sections of code are protected where there are possible problems due to simultaneous access. Some of the mistakes from potential concurrency and dead-lock problems are specific to Streams drivers and modules, and it is easy to forget that the Streams scheduler that executes your read and write service routines is a separate thread.

The most common problem occurs when the list of queues is corrupted by simultaneous access to the queues with putq() or putnext() at the same time the Streams scheduler is processing the list. This problem can be eliminated by doing all of the processing of the messages in the service routines rather than processing the messages in the put procedures. The service routines are intended to be used this way, and also, as described above, it is an essential part of the flow control mechanism inherent to Streams. In a few cases, implementing the driver or module this way may not be possible, and for some simpler drivers, it may introduce unwanted added complexity. Some RTOSs don't automatically protect the queues from within the Streams putq() procedures, and your OS vendor may have forgotten to document this fact. If this is the case, whenever putq() or putnext() is called outside of a service procedure, the call must be protected from the Streams scheduling thread with a mutex or semaphore. Also, it is necessary to remember that the putnext() call is really the same as putq(q->q_next) and q_next usually points to a queue in the next module or driver up or down stream. A failure to protect the putnext() call can cause a lock-up in an entirely different module or driver. Another common mistake is calling the queue put procedures from a hard interrupt or timer routine. Because a common internal data structure may be damaged, these problems can sometimes show up as side effects in seemingly unrelated modules or drivers elsewhere in the system.

Porting from UNIX to an embedded OS

It goes without saying that it is preferable for time-to-market reasons to meet a requirement with a port rather than a rewrite. The availability of Streams should be a factor in selecting an embedded RTOS for a communications application and can be a point in considering the usual buy vs. build decision. Most existing networking protocols, whether LAN or WAN, have been implemented in Streams at some point or another. The Unix SVR5 variants from several vendors, which of course include Streams, have been prevalent for some time and these systems are in common use in telecommunications. Often requirements can be met by porting an existing network protocol stack. Although some important modifications will be needed, if the RTOS is compliant with the Streams API, the modules and drivers should port fairly easily.

Another consideration is the ability to dynamically load device drivers. A port will go more easily if the OS supports dynamic loading of Streams modules and drivers. This allows changes and debugging to proceed without relinking, installing, and rebooting the OS each time.

Most existing Unix driver code depends on manipulation of the hardware processor execution level (for example the calls splstr() and splx() on Suns) for protection of internal data structures against simultaneous access from interrupt service routines. In an RTOS with a preemptable multi-threaded kernel, semaphores are provided as a mechanism for explicit synchronization. All explicit and implicit dependencies should be replaced with protection by explicit semaphores. The semaphores will have to be added to the network protocol's data structures, or they can be placed in global memory.

Interrupt service routines should be made as short as possible. Most processing associated with the interrupt can be done in a kernel thread. As I've described above, a kernel thread can wait on a semaphore that is signaled when the hardware interrupt is received. The threads will then do most of the interrupt-related processing, freeing the OS to distribute CPU activity according to thread priority.

Most OSs provide a timeout mechanism in the kernel. The timeout routines that execute when the timer expires are extensions of the timer's interrupt service routine because they run at hardware interrupt level. A kernel thread should be coded to wait on a semaphore that is signaled when the timer expires. The timeout routine should merely signal the semaphore and return, leaving most of the processing for the thread.

All code should be checked for all the calls to putq() and putnext() because these routines change the state of the queues. You can't assume that queue put procedures are safe because in most Streams implementations these queue procedures do not intrinsically include mutex protection of the queue data structures. Explicit mutexing is required where these calls are made from interrupt threads or timer threads. Also, if putq() or putnext() is called from within the context of the module's or driver's put(), open(), or close() procedures, the queues should be protected. These procedures don't need protection if they are called from the module or driver's service procedures because the service procedures are called from the Streams scheduler's own context.

A final assessment

Streams is an excellent choice for many embedded systems applications, particularly if an application involves network protocols or if you anticipate the system expanding to require networking. Also, Streams is a good choice for parallel, serial, or any specialized character stream I/O application. If the device driver is written as a Streams driver, additional modifications can often be done by writing an additional module and "pushing" it on top of the driver. The availability of Streams could be an important factor in selecting a real-time embedded OS for your next application.

Tom Herbert is an independent software engineering consultant with CH Communications Inc. Before working for CH Communications, he was a lead engineer working with embedded operating systems technology at Xerox Corporation working with all aspects of embedded operating systems including strategy, design and implementation. Before Xerox, Tom worked at Eastman Kodak Company designing and developing advanced embedded applications. He holds a patent in pattern recognition in embedded applications.

References

AT&T. Unix System V Release 3.2 Streams Programmer's Guide. Englewood Cliffs, NJ: Prentice Hall, 1989.

Ritchie, D.M., "A Stream Input-Output System," AT&T Bell Laboratories Technical Journal, Oct. 1984.

Saxena, S., Peacock, J.K., Verma, V., and Krishnan, M., "Pitfalls in Multithreading SVR4 Streams and Other Weightless Processes," Proceedings of the Winter 1993 USENIX Technical Conference, Jan. 1993.

Unix System Laboratories. Streams Modules and Drivers, Unix SVR4.2, Englewood Cliffs, NJ: Prentice-Hall.

Vahalia, Uresh. Unix Internals, The New Frontiers. Englewood Cliffs, NJ: Prentice Hall, 1996.

Embedded.com Career Center
Ready for a change?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :