Implementing Network Protocols and Drivers with Streams
This article
examines Streams and the framework's usefulness
in implementing and porting multiple networking protocols and
drivers for real-time embedded applications.
Several of the popular real-time operating system vendors now
offer Streams as an option. Streams is a framework for implementation
of high-performance networking protocols and device drivers. A
solution for integrating multiple networking protocols, Streams
has been a part of traditional computer operating systems for
years and is now available
as an option from several embedded
RTOS vendors. One might wonder why people are putting a seemingly
complex "big government" mechanism into efficient and
small RTOSs. I will attempt to answer this question with specifics
about performance and overhead associated with Streams.
Most of my experience has been in one form or another of electronic
imaging. I can think of many examples in an historical context
that will illustrate how Streams has crept into our consciousness
to become part of the OS
requirements for sophisticated embedded
applications. Over the years, OS requirements for embedded systems
have changed. At first the OSs were asked to support very specialized
applications, such as gathering data from a single specific scanner
and presenting it in summary form on a single screen, or taking
control inputs from a single set of switches. Now, embedded equipment
must present the data in real time over a communication network
to PCs or other equipment for analysis. Often embedded applications
are in equipment that must have a presence on a heterogeneous
network. These embedded applications are in equipment such as
gateways, routers, or sophisticated office equipment (document
printers and multi-functional print/scanning/faxing devices).
In these applications, the protocols must run simultaneously with
application software that has hard real-time requirements for
device control and interrupt latency.
In this article, I will examine Streams and its usefulness in
implementing and porting
multiple networking protocols and drivers
for real-time embedded applications. Viewing Streams in relationship
to real-time and multi-threaded operating systems is important
because Streams was originally designed for Unix, which is not
a real-time system. Therefore, we should be aware of issues specific
to embedded RTOSs when developing or porting Streams modules and
device drivers. Also, when working with Streams in real-time embedded
systems, there are several important techniques for multiplexing,
flow
control, interrupt handling, and avoiding dead-lock and race
conditions.
What is Streams? Why use it?
Streams has its origin in the Unix OS. Originally, the device
driver mechanism in Unix provided only the standard user-level
driver entry points and a way to register an interrupt handler
with the OS. The traditional Unix device driver was called a character
driver and was intended to handle a flow of individual characters
from a serial device, such as an ASCII terminal. The traditional
driver
didn't provide mechanisms for buffer management or flow
control, which are almost universally required in device drivers
and especially in networking protocols. Only a relatively small
part of a device driver is hardware specific and specific to a
given application. Every programmer had to provide his/her own
redundant buffer and queue management capability. Writers of device
drivers could borrow boiler plate code from each other to avoid
redundant effort, but it was clear that a standard mechanism would
be
beneficial. Unix OSs had block drivers that would extend the
character driver concept by providing buffer management for disks
and other devices with file systems, but these are not well suited
for full-duplex communications I/O. A standard common facility
for communications device drivers and protocols was needed. Streams
was developed to fit this need.
Streams can be beneficial because it provides a standard framework
to support implementation of services for high-performance networks.
By providing a
common mechanism for buffering and I/O processing,
it reduces duplicated code and buffer space. This often allows
the reuse of drivers or modules or simplifies the writing of new
ones. Also, Streams encourages reuse because it provides a standard
underlying framework for communications protocols. Just about
every existing networking protocol or device driver was supported
in Unix at one time another, so the porting of existing modules
or device drivers to a real-time embedded application becomes
easier. In my
experience, this situation had often allowed the
purchase of sources for the communications protocols we needed,
and we had a relatively small porting effort to bring those protocols
up in our real-time embedded application.
Streams provides a full-duplex data path between device driver
and application. It supports simultaneous control and data messages
with a mechanism for prioritizing messages. Streams also provides
a mechanism to layer or "push" modules on top of each
other at run time.
This provision makes it an excellent choice
for implementation of networking protocols that have a layered
architecture, such as those that follow the OSI architecture.
Streams supports multiplexing, which is also a common requirement
for communication protocols. A single networking protocol can
use the Streams multiplexing capability to run simultaneously
on two different physical layers. For example, we were able to
make TCP/IP run simultaneously on two physical networks by having
DLPI (Data Link Provider
Interface, a specification for what occurs
between the data link and network layers) compliant drivers for
both Ethernet and Token Ring. Streams also allows the sharing
of data buffers between two or more messages. This allowance reduces
overhead when running multiplexed networking protocols by allowing
two protocols to process the same message block simultaneously,
thereby enhancing performance by reducing the copying of data.
History of Streams
Streams was originally developed by Dennis
Ritchie at Bell Labs
in 1983. It was released as part of AT&T SVR3 (System Five
release 3) Unix in the mid '80s. Streams was augmented with several
new features as part of SVR4 (System Five release 4) in the late
'80s. It has been available as a feature in one real-time OS,
LynxOS, since approximately 1993. Streams is now available from
two other major RTOS vendors: ISI and Wind River.
Implementation in real-time operating systems
Streams is best suited for operating systems that utilize the
microprocessor's MMU with separate user and kernel context. It
is designed to hide the message passing mechanism from the application
program, allowing application programs to be backwards compatible
with those using a simpler I/O mechanism. Streams was originally
designed for Unix and provides an API compatible with the familiar
Unix I/O interface consisting of the routines open(), close(),
read(), write(), and ioctl(). An application that is reading from
an I/O stream does not need to know whether the
underlying implementation
is Streams unless it is taking advantage of the additional user-level
functions, such as putmsg() or getmsg() provided by Streams.
Streams contains a scheduler to sequence the servicing of the
module and driver queues. In an RTOS, the scheduler is implemented
as one or more separate kernel threads. The OS provides semaphore
locking for the Streams message queues because they are accessed
by both the scheduler thread and other threads in the drivers
or modules. In many OSs,
possibly for efficiency reasons, the
queue protection is not coded into putq(), the procedure provided
to place messages in a queue.
Typical protocol stacks implemented with Streams
Typical network communication protocols are organized in layers
according to the OSI 7-layer model. Streams was designed to facilitate
the implementation of these protocols. The seven layers of the
OSI model, starting from the bottom, are:
- 1. Physical
- 2. Data Link
- 3. Network
- 4. Transport
- 5. Session
- 6. Presentation
- 7. Application
Protocol suites can contain multiple transports or complex multiplexed
network layers, but typical common networking protocol suites
usually only will require that Layers 1 through 3 or 4 be implemented
in the kernel. Streams is well suited to implement these protocols
because of its ability to place or "push" modules on
top of each other. Each separate module services the same data
stream, adding value as the data travels through the
pipe. For
example, for incoming data in a networking protocol, each layer
may strip data from the beginning or end of an input packet. In
the case of outgoing data, each module will add data to the packet.
Because the TCP/IP protocol suite is usually bundled in with the
OSs provided by real-time vendors, and because it is common and
relatively well understood, a Streams implementation of TCP/IP
will serve as a good example here. TCP/IP's layered organization
are as follows:
- IP: Network layer
- UDP: Transport layer
- TCP: Transport layer
- 802.3 LLC: Data link layer
- Ethernet device driver: Physical layer
Figure 1 illustrates how Streams multiplexing can be used to organize
the TCP/IP protocol where there is both upper and lower multiplexing.
The LLC module is a 1-to-n multiplexer and the IP module is an
n-to-1 multiplexer. Also, as illustrated, TCP/IP contains two
transport layers: UDP, for connectionless
communication, and TCP,
which provides a connection-oriented transport.
Declarations for a multiplexing module
Using a typical multiplexing module as an example, I will illustrate
how the data structures should be declared. The declarations for
a multiplexing module are fairly straightforward. The streamtab
structure uniquely defines a module or driver, and the qinit structures
have the procedures and information used to initialize each queue.
In this particular module, all four queues are
declared in the
streamtab structure: the upper read side, upper write side, lower
read side, and lower write side queues.
Streamtab is the only structure that needs to be global. The other
routines are only called indirectly by accessing their pointers
through the queue_t structures to be described later. For accuracy,
I should mention that the lower multiplexing actually requires
a data structure to keep track of the queue in the module or driver
linked below the multiplexing driver.
The fact that
all four queues are used in this multiplexing module
indicates that the module is intended to support lower multiplexing.
Lower multiplexing is done by linking the multiplexing module
to the module or driver below. Upper multiplexing is done by pushing
multiple modules on top of a single module; it requires only the
upper two queues be maintained. Of course, it is the responsibility
of the module to keep track of the message routing because this
routing tends to be specific to each networking protocol.
Streams messages and queues
In addition to the streamtab structure shown above, the message
and queue data structures discussed are the core of Streams. An
understanding of these two data structures is essential for a
thorough understanding of how Streams drivers and modules behave.
The Streams message block contains the list of packets waiting
for servicing. As shown in Figure 2, the Streams message structure
consists of a message block, a data block, and the actual data
buffer. The message
block contains a pointer to the data block
b_datap, as well as read and write pointers b_rptr and b_wptr,
which keep track of the positions in the data buffer. The b_cont
pointer can be used to chain message blocks for multi-part messages.
The data block contains the pointer to the base of the data buffer
db_base and a pointer to the end of the data buffer db_limit.
The data block also contains the reference count db_ref, which
is the
number of message blocks pointed to by this data block.
This information is used in the sharing of data between two messages.
The reference count is incremented each time a message is duplicated
by calling dupmsg() or dupb() to cause a new reference to the
data block. It is decremented when freemsg() or freeb() is called
to free a message that points to the data block.
The data buffers contain the actual data and are available in
several standard sizes ranging from four bytes to 4K bytes. The
buffers
are allocated by the allocb() call and can be assigned
one of three priorities. A means of recovering from buffer allocation
failure also exists. A callback can be set by passing a function
pointer in buffcall(). The function will get control when allocb()
fails so it can attempt reallocation.
The Streams queue structure actually stitches the modules and
drivers together. As can be guessed from the module declarations
above, a module or driver contains two to four queues. A simple
driver contains two
queues (one for the read side and one for
the write side), while a module capable of upper and lower multiplexing
would contain four queues. The list of queues are built when modules
are pushed, and the list is dismantled when modules are popped.
The Streams scheduler traverses this list, looking for queues
with non-empty message lists. When found, the scheduler executes
the associated service routine. From the queue structure, Streams
can find all the information relevant for maintaining and servicing
the
associated module or driver.
The queue contains the head pointer to the list of message blocks
awaiting servicing and it also contains limits and information
about the queue. The queue contains the limits for maintaining
the queue's flow control q_hiwat and q_lowat, as well as the minimum
and maximum packet size q_minpsz and q_maxpsz, for this particular
queue. The queue state flags in q_flag indicate whether the queue
is full and whether it is enabled for scheduling, and the q_count
contains the count
of characters on the message list.
The Streams queue also has a pointer to the q_info structure containing
the data used to initialize the queue and pointers to the module
or driver procedures. Streams stuffs q_info with pointers to the
procedures obtained from the streamtab structure described above
in the declarations. Generally, the developer places a pointer
to the driver's or module's "private data structure"
in q_ptr as shown in the figure below. The field q_ptr is referenced
from the
queue pointer (typed queue_t*) passed to each of the
put and service procedures, as shown in the declarations above.
Therefore, any globally useful private data can be accessed from
all of the service and put procedures without needing a lot of
trouble-prone global data in each driver or module. Also, the
Streams queues are protected by a mutex to prevent the queue service
routines from accessing queues as they are being updated from
elsewhere in this module or even in other modules. The mutex may
be
implemented within Streams or the OS vendor may leave this
important detail to the driver or module writer.

Flow control
As discussed above, Streams provides a scheduling mechanism consisting
of a list of message queues, each of which has high and low limits
for the number of bytes in the queue. Each of these queues has
a service procedure that gets scheduled for execution whenever
the queue contains data. The low and high water
marks in the queue
are used to maintain flow control.
The mechanism for flow control is straightforward. A queue is
marked FULL when its count exceeds the high-water mark; the service
procedures in queues behind the current queue can call canput()
to see if the queue's FULL flag is set. They can use putbackq()
to "keep" the messages until the queue reaches the low
water mark and the FULL flag is unset.
Implementing streams modules and drivers in multi-threaded and
real-time systems
Design for efficiency while maintaining awareness of concurrency
issues is important. Some common techniques allow you to make
maximum use of the OSs facilities for efficiency, but some potential
pitfalls for real-time implementations exist as well. Streams
modules and drivers will usually have multiple threads, even within
a single module. For example, a module will contain the main thread,
invoked when the application makes a system call to the driver
entry points such as ioctl(), open(), close(),
getmsg(), or putmsg().
Also, the implementer must remember that the Streams scheduler
as implemented in the OS is a separate thread. This distinction
is necessary because the scheduler must run asynchronously to
schedule the service procedures for the queues whenever they have
data available. In addition to these two threads, each module
may have one or more timer threads. Also a driver may need an
interrupt thread that awakened when the hard interrupt service
routine signals, indicating an incoming packet.
Because of the
inherently multi-threaded nature of Streams, it is advisable to
pay careful attention to concurrency issues.
When designing a driver, kernel threads should be used for most
of the interrupt processing. Interrupt threads are commonly used
in real-time operating systems because hard interrupt routines
can't be preempted, and allowing priorities to be monitored and
managed by the OS mechanism is more efficient in terms of CPU
utilization. If your RTOS offers this capability, it is always
best to
do most of the processing in an interrupt thread. The
driver should be written so the hard interrupt handler does very
little; it should only turn off interrupts, grab copies of hardware
registers, and trigger the semaphore to wake up an interrupt thread,
which does most of the work to complete the interrupt processing.
Minimize the copying of data
Copying the data in a driver as few times as possible is always
best. Streams facilitates this economy because DMA or direct I/O
can be done
between the Streams buffer in memory and the hardware.
This arrangement ensures that data is copied by software only
once-when it is moved between the Streams buffer and the applications
buffer during the processing of the read() or write() system call.
Generally, all the message processing is done by moving pointers
to the message blocks. A routine to copy messages, copymsg(),
is provided in the Streams API bug. Streams itself never copies
the data. Usually, the only time data is copied is when the protocol
requires forming a return or acknowledgment message containing
the same data as the received packet. Sometimes when there is
a need to share data between two multiplexed modules, it will
be necessary to copy the data. However, if the data is not going
to be destroyed or altered by either of the modules, dupmsg()
can establish a new message descriptor pointing to the same list
of data blocks. In short, copying data is expensive in a real-time
system, and Streams provides mechanisms to minimize this overhead.
Using flow control can help real-time performance
Drivers and modules should make maximum use of the Streams flow
control mechanism. For example, a driver may receive an interrupt
when a packet is in the ring buffer of the interface card. The
driver then needs to form this packet into a Streams message ready
to pass to the read side of the upstream module. If the Streams
mechanisms are used to advantage, flow control is used to avoid
bottlenecks in message processing. The driver is
implemented with
an upper read service routine and putq() is used to place the
message on its own queue. The service routine will be called when
the high-water mark is reached. Typically, the upper read service
routine is implemented as follows (and a similar technique can
be used on the write side):
static int mursrv(queue_t *q)
{
mblk_t *mp;
while (canput(q->q_next)) {
mp = getq(q);
putnext(q, mp);
}
}
This simple mechanism can improve reliability and
performance
with fewer lost and dropped packets during peak useage.
Kernel threads are used to supplement the timers necessary for
resending and connection management. Network protocols have what's
known as a "reliable connection mode" that requires
that packets be sequenced so that any missing packets can be resent.
OSs provide timer mechanisms for this kind of purpose. Most RTOS
vendors implement the timeout facility by calling the timeout
functions directly from the timer's hard interrupt
service routines,
and they may neglect to document this important detail. This timer
interrupt level processing can steal CPU cycles from other kernel
and user functions because they cannot be scheduled or prioritized,
so it is best to do as much time-dependent processing as possible
in a separate kernel thread. This timer thread sleeps on a semaphore
waiting to be awakened by a timer interrupt. The typical timeout
(interrupt level timer function) can be implemented very simply
as follows. Substitute your OS
vendor's calls below:
void mytimer()
{
ssignal(timer_sem);
}
Therefore most of the complexity necessary to implement the protocol
is in a separate thread, which is implemented as follows:
int my_time_thread()
{
swait(timer_sem);
/*...processing...*/
}
Avoiding race conditions and deadlocks
Concurrency is inherent in most applications using multi-threaded
RTOSs, and each device driver implementation should be looked
at as if it
were running on a multi-processor system. Of course,
you should always take care to make sure critical sections of
code are protected where there are possible problems due to simultaneous
access. Some of the mistakes from potential concurrency and dead-lock
problems are specific to Streams drivers and modules, and it is
easy to forget that the Streams scheduler that executes your read
and write service routines is a separate thread.
The most common problem occurs when the list of queues is corrupted
by
simultaneous access to the queues with putq() or putnext()
at the same time the Streams scheduler is processing the list.
This problem can be eliminated by doing all of the processing
of the messages in the service routines rather than processing
the messages in the put procedures. The service routines are intended
to be used this way, and also, as described above, it is an essential
part of the flow control mechanism inherent to Streams. In a few
cases, implementing the driver or module this way may not
be
possible, and for some simpler drivers, it may introduce unwanted
added complexity. Some RTOSs don't automatically protect the queues
from within the Streams putq() procedures, and your OS vendor
may have forgotten to document this fact. If this is the case,
whenever putq() or putnext() is called outside of a service procedure,
the call must be protected from the Streams scheduling thread
with a mutex or semaphore. Also, it is necessary to remember that
the putnext() call is really the same as
putq(q->q_next) and
q_next usually points to a queue in the next module or driver
up or down stream. A failure to protect the putnext() call can
cause a lock-up in an entirely different module or driver. Another
common mistake is calling the queue put procedures from a hard
interrupt or timer routine. Because a common internal data structure
may be damaged, these problems can sometimes show up as side effects
in seemingly unrelated modules or drivers elsewhere in the system.
Porting from UNIX to an
embedded OS
It goes without saying that it is preferable for time-to-market
reasons to meet a requirement with a port rather than a rewrite.
The availability of Streams should be a factor in selecting an
embedded RTOS for a communications application and can be a point
in considering the usual buy vs. build decision. Most existing
networking protocols, whether LAN or WAN, have been implemented
in Streams at some point or another. The Unix SVR5 variants from
several vendors, which of course include Streams,
have been prevalent
for some time and these systems are in common use in telecommunications.
Often requirements can be met by porting an existing network protocol
stack. Although some important modifications will be needed, if
the RTOS is compliant with the Streams API, the modules and drivers
should port fairly easily.
Another consideration is the ability to dynamically load device
drivers. A port will go more easily if the OS supports dynamic
loading of Streams modules and drivers. This allows
changes and
debugging to proceed without relinking, installing, and rebooting
the OS each time.
Most existing Unix driver code depends on manipulation of the
hardware processor execution level (for example the calls splstr()
and splx() on Suns) for protection of internal data structures
against simultaneous access from interrupt service routines. In
an RTOS with a preemptable multi-threaded kernel, semaphores are
provided as a mechanism for explicit synchronization. All explicit
and implicit dependencies
should be replaced with protection by
explicit semaphores. The semaphores will have to be added to the
network protocol's data structures, or they can be placed in global
memory.
Interrupt service routines should be made as short as possible.
Most processing associated with the interrupt can be done in a
kernel thread. As I've described above, a kernel thread can wait
on a semaphore that is signaled when the hardware interrupt is
received. The threads will then do most of the interrupt-related
processing,
freeing the OS to distribute CPU activity according
to thread priority.
Most OSs provide a timeout mechanism in the kernel. The timeout
routines that execute when the timer expires are extensions of
the timer's interrupt service routine because they run at hardware
interrupt level. A kernel thread should be coded to wait on a
semaphore that is signaled when the timer expires. The timeout
routine should merely signal the semaphore and return, leaving
most of the processing for the thread.
All code
should be checked for all the calls to putq() and putnext()
because these routines change the state of the queues. You can't
assume that queue put procedures are safe because in most Streams
implementations these queue procedures do not intrinsically include
mutex protection of the queue data structures. Explicit mutexing
is required where these calls are made from interrupt threads
or timer threads. Also, if putq() or putnext() is called from
within the context of the module's or driver's put(), open(),
or
close() procedures, the queues should be protected. These procedures
don't need protection if they are called from the module or driver's
service procedures because the service procedures are called from
the Streams scheduler's own context.
A final assessment
Streams is an excellent choice for many embedded systems applications,
particularly if an application involves network protocols or if
you anticipate the system expanding to require networking. Also,
Streams is a good choice for parallel,
serial, or any specialized
character stream I/O application. If the device driver is written
as a Streams driver, additional modifications can often be done
by writing an additional module and "pushing" it on
top of the driver. The availability of Streams could be an important
factor in selecting a real-time embedded OS for your next application.
Tom Herbert is an independent software engineering consultant
with CH Communications Inc. Before working for CH Communications,
he was a lead
engineer working with embedded operating systems
technology at Xerox Corporation working with all aspects of embedded
operating systems including strategy, design and implementation.
Before Xerox, Tom worked at Eastman Kodak Company designing and
developing advanced embedded applications. He holds a patent in
pattern recognition in embedded applications.
References
AT&T. Unix System V Release 3.2 Streams Programmer's Guide.
Englewood Cliffs, NJ: Prentice Hall, 1989.
Ritchie, D.M.,
"A Stream Input-Output System," AT&T
Bell Laboratories Technical Journal, Oct. 1984.
Saxena, S., Peacock, J.K., Verma, V., and Krishnan, M., "Pitfalls
in Multithreading SVR4 Streams and Other Weightless Processes,"
Proceedings of the Winter 1993 USENIX Technical Conference, Jan.
1993.
Unix System Laboratories. Streams Modules and Drivers, Unix SVR4.2,
Englewood Cliffs, NJ: Prentice-Hall.
Vahalia, Uresh. Unix Internals, The New Frontiers. Englewood Cliffs,
NJ: Prentice
Hall, 1996.