|
|

|

|
An Introduction to I
2
O
by Larry Mittag
The Intelligent I/O
architecture has the potential to improve throughput significantly on
some systems. The question is, what will it do for your system?
Bringing in help for someone who has too much work to do is a simple managment technique. The difficult part is deciding how to split up the work so that the two aren't spending all of their time discussing work and instead are spending most of the time working.
Apply these concepts to the state of personal computing. The CPU has become an overworked employee,
with today's complex hardware, operating systems, and applications. State-of-the-art CPUs implement multiple levels of cache and instruction pipelining in an attempt to keep up with the demand for executing instructions as fast as possible. This system works well until that modem session you're running in the background causes an interrupt on the serial port. Suddenly, the CPU has to drop what it's doing, dump the pipeline, and go off and run an interrupt service routine. This scenario is roughly equivalent to
having the CEO of your company answer the telephone at the front desk.
Making peripherals more intelligent, then, will offload this type of processing from the CPU. In the case of the serial port, increasing the buffering in the 16550 peripheral chip over the 16450 was effective. The CPU didn't have to take the interrupt hit as often, resulting in increased serial port throughput and decreased CPU usage-definitely a win-win situation.
But what if the problem is a performance bottleneck on a RAID disk
subsystem attached to a server running Windows NT and SQL server? Complex environments like this demand much smarter peripherals, and as a result, the use of coprocessors in such devices has grown exponentially in the last few years. Unfortunately, the number of proprietary interface definitions for these smart devices has grown just as fast.
The I
2
O Solution
This is where the Intelligent Input/Output (I
2
O) architecture comes in. This architecture was designed by
the members of the I
2
O Special Interest Group (I
2
OSIG) to create a way of distributing the processing that is involved in I/O operations. It defines a way of interfacing to smart peripherals that can offload much of the interrupt processing and other low-level hardware handling from the CPU. The differences in this approach from today's most common setup is shown in
Figure 1.
Today's device drivers essentially consist of two interfaces, as shown in the left
part of Figure 1. The upper interface coordinates with the operating system, while the lower half handles the hardware. The angled split between the two indicates the fuzziness that sometimes exists between these roles. Sometimes the two interfaces aren't as clear-cut as they probably should be.
The I
2
O approach defines a communications layer that enforces that split. The I/O Processor (IOP) handles everything on the bottom of the communications layer, which is basically all of the hardware and
other processing that doesn't depend on the resources of the host operating system.
We should note a couple of key points here. Two more interfaces than before are now defined, so a certain amount of overhead is added to the system. Compare this to the intra-office discussion with the new employee I mentioned earlier. If this extra overhead subtracts more from the throughput equation than the overlap in processing between the host CPU and the IOP adds to it, then the architecture has produced a net drag on
performance. For example, a character-driven terminal emulation application running on a serial port would probably see a net loss in performance due to an architecture shift like this. The key is to apply this type of architecture to I/O operations that have significant processing to be done on the data that can be offloaded. For example, if that same serial port were used as a PPP link to a TCP/IP stack, offloading much of the lower protocol layers on to the IOP might be possible. This type of application
shifts the equation back to a net gain for the system.
The second key point is that the communications layer definition does not depend on the underlying hardware implementation of the device. The portion of the device driver that resides within the host operating system can therefore be very generic. The same code could be used for a disk subsystem, whether that disk had an IDE or a SCSI interface attached to it.
This point represents an important advantage for peripheral vendors. In the past, they have
had to be experts at writing (or getting someone to write) device drivers for all of the popular operating systems. The problem with that is having to determine well in advance which operating systems will be popular with potential customers. Beyond that, they had to get early releases of these OSs to be able to write the software.
With the I
2
O architecture, the peripheral vendor simply has to support the bottom half of the device driver. The upper half now becomes the responsibility of the OS
vendor, be that Microsoft, Novell, IBM, or some Linux hacker. This situation is potentially a huge advantage for these peripheral vendors.
Legal Entanglements
A valid response to these philosophies behind the architecture might be, ýGee, that's a great idea! Why didn't someone else think of that first?ý The answer is that someone did. This communication problem was also evident in the mainframe days of old, and similar steps were taken to offload those beleaguered CPUs as well. As with
many good ideas, this led to patents being filed. When people in the I
2
OSIG discovered the advantages of this particular distribution of processing power, they found patents and serious lawyers standing in the way of implementing and distributing it.
I don't want to get into the details of the discussions that ensued (so to speak), but the bottom line is that agreements were reached such that I
2
O can now be made available for use. Unfortunately, it isn't freely and widely available.
The Web site http://www.I
2
Osig.org has the information necessary to obtain the specification but you must pay a nontrivial fee for that information ($250 before you can download). This isn't likely to be a major deterrent to most companies looking to use the architecture, but it certainly discourages curious engineers working on their own.
Why Do We Care?
At this point, you may be wondering why this architecture is being covered in this particular magazine. After all, this
isn't PC Week, or another magazine that primarily covers the happenings of the PC world. This is Embedded Systems Programming, where we often program right down to the bare metal like Real Programmers, unlike those sissy PC guys (insert the sound of a hairy chest being thumped here).
For me, the interesting part of this architecture is below the communication layer. Here I
2
O defines an architecture with a set of real-time capabilities that will allow me to write hardware control systems with
hard real-time deterministic response capability. I
2
O defines a set of RTOS calls that fulfill the basic needs of a multitasking system running on the IOP. The IOP itself is defined generically, but the first implementation of a CPU that supports the architecture is the i960 RP/RD from Intel-certainly a CPU that belongs in the embedded systems sphere of influence.
The OS itself also has a respectable implementation. Wind River has created a special subset of VxWorks, named IxWorks, that meets the
interface requirements defined in the I
2
O specification. Therefore, buying the pieces to implement an I
2
O architecture system is now possible, and the word I've heard is that other vendors will soon be providing their versions of both pieces of the puzzle. I
2
O is more than just Intel, Wind River, and a few of their buddies getting together to define another closed system.
The really interesting thing about the I
2
O architecture is the fact that the host isn't
necessarily defined as an Intel-based PC. The communications layer interface is based primarily on the PCI bus, so any CPU that uses that bus should be able to write a generic device driver that can talk to an I
2
O queue. See
Figure 2 for details.
In other words, the architecture works just as well on a PC based on a Digital Equipment Alpha chip as it does on an Intel-based PC. For that matter, it will work on any system that is based on the PCI bus. Have you heard about Sun
Microsystem's newly-supported PCI bus architecture by any chance? Or the Macintosh support of the PCI architecture? For that matter, the PC104+ architecture is basically a PCI bus with connectors friendly to the embedded world.
The bottom line is that I
2
O adds a layer to the PCI specification that makes it possible to use a variety of smart interface peripherals without having to write specific device drivers for those devices. It also allows a board developer to create a smart embedded system
that can address both the wide PC market and the more narrow specific embedded niches. The I
2
O specification may have been developed to make file servers go faster, but nothing is stopping us from using it in the next generation of set-top boxes or other high-end embedded applications.
The IOP Environment Definition
The I
2
O architecture is formally defined as open, but it does favor the initial implementation in both hardware and software. The hardware side
assumes many of the attributes of the Intel i960 and the software side builds on the capabilities of the IxWorks version of the VxWorks RTOS. A strict definition would require quoting the specification, which would be a violation of the non-disclosure agreement, but a broader description of the hardware and software that makes up the embedded side of I
2
O follows.
IOP
The IOP has to be a CPU that does I/O well-it must have DMA capability and very quick interrupt response capability. A hardware message
queuing mechanism is also highly desirable because the message queues are critical to the I
2
O architecture.
The interface to the system bus is also highly important. The initial implementation of I
2
O is based on the PCI bus, and the i960 RP has a built-in PCI-PCI bridge that can extend the number of PCI slots in a system. This can be very useful for the more complex system configurations that can be built on the I
2
O architecture, some of which involve multiple layers of
IOPs.
There is a grocery list of other features of the i960 RP that makes it easy to use as an IOP. It has scatter-gather DMA capability, which allows the DMA operation to assemble portions of a communications packet, for example. It also has a boot sequence that allows firmware to be loaded from a host.
The hardware side of the I
2
O specification isn't overly tied to the i960, which is both a strength and a weakness as far as I'm concerned. The problem is that if different CPUs are used for IOPs,
then the low-level device driver must be compiled for multiple architectures. As much as I dislike the thought of ceding yet another near-monopoly on CPU architecture to Intel, it would be much simpler if the instruction set were a known quantity across all I
2
O systems.
Note, however, that this is only a concern if the IOP is resident on the motherboard of the host system. The specification details multiple arrangements of IOPs. One arrangement has the IOP on the motherboard, shared across
multiple I/O devices, while another has the IOP resident directly on the I/O peripheral. The latter case is much more of a traditional embedded system, and the programmer is working with a known quantity, as far as the CPU is concerned.
IRTOS
The I/O Real-Time Operating System (IRTOS) is similarly defined. As with the hardware it tends to be defined by the initial IxWorks implementation, but there are also some interesting wrinkles. To begin with, the set of IRTOS system calls is split
into multiple groups: the I
2
O Shell API, the I
2
O Core API, and the embedded kernel services API. The first two have more to do with the interface that the I
2
O subsystem presents to the host system, while the latter is the set of services that a traditional RTOS would provide in an embedded system. The nice thing is that the API is defined as far as these calls is concerned, so the specification theoretically doesn't lock the user into a single-vendor solution. Of course, the
reality is that IxWorks is currently the only implementation of the IRTOS architecture, so as a practical matter, Wind River currently has a monopoly. I suspect that could change quickly if I
2
O takes off and does well.
The set of capabilities defined in the I
2
O specification for the IRTOS tends to be a superset of the typical RTOS. The usual multitasking and real-time interrupt response is specified, as are a message-queuing system and semaphore operations. But there are also
requirements as to the owning of objects within the system, allowing device drivers to be loaded and unloaded without leaking resources. The IRTOS must also support hardware message passing, creating a more efficient means of locking access than the more traditional spin locks in multiprocessor architectures.
In addition, the IRTOS must support a more sophisticated DMA model than has been used in the past. The IRTOS must allow DMA channels to be allocated and used on an as-needed basis, rather than having each channel
dedicated to a specific interface or task as has happened in the past. This support allows much more flexible use of resources within the IOP.
Performance Improvements
I
2
O sounds like a significant improvement in the architecture of high-end network servers, but is there any proof that it will be worthwhile? I've already discussed the fact that there will be benefits for the vendors, as far as operating-system independence, but what about with regard to real-world
performance issues?
Performance problems can be broken down into several cases, each of which has its own unique characteristics. The first of these is the case of a requirement to move data along a linear path as fast as possible. This problem is faced by systems along the path of streaming video data, for example. The characteristics of this type of data flow are such that any bottleneck along the path will slow down the entire stream, which results in ugly, choppy video and unhappy customers.
This type of
problem may or may not be amenable to an I
2
O-style solution. The question becomes whether or not the portion controlled by the IOP is the bottleneck in the system. For example, it would appear to make little sense to implement an I
2
O interface to a 28.8kbaud modem line in an attempt to enable streaming video. Common sense tells us that the telephone line is the limiting factor and the IOP will spend most of its time waiting for data.
But what if that CPU could do something useful during
that time? What if significant data compression were being used and the IOP had the job of decompressing the data stream? The interesting aspect of this question is that compressed video data streams are notoriously difficult to synchronize to a smooth data rate. The frames come faster when there isn't much changing in the video and slower when there is a lot of action. This forces the decoding CPU to take into account timecodes to smooth out the data delivery to the consumer.
A real-time environment like I
2
O would be very useful in this case. It would be difficult to do this data smoothing in the Windows environment, where determinism is at best difficult to achieve. The smooth display of video data could be done entirely in the I
2
O environment, given the right combination of a smart modem and a smart video card.
Actually, a similar example was shown at the Spring '96 COMDEX by a company called Xpoint. They hooked up a Compaq Proliant 5000 to a dozen clients on six Ethernet segments. Each
segment was controlled by an I
2
O network adapter that could communicate with the I
2
O RAID disk subsystem on the Proliant server. Ten of the clients were running disk performance benchmarks, while the other two were displaying MPEG video.
This is the type of demanding I/O environment where I
2
O shines. The obvious bottleneck would be the CPU in the Proliant server, and when I
2
O operation was disabled, that proved to be the case. This test resulted in data
throughput maxing out at 23MBps, choppy video, and 91% utilization of the CPU. When I
2
O was enabled, the net data throughput rose to 41MBps, the video smoothed out, and the CPU utilization dropped to 25%. This is particularly important if your office has the requirement to keep a couple of managers distracted while other people get some work done.
All jokes aside, this is a legitimate I
2
O solution to a difficult problem. High linear data rates can be difficult to get right, and having more
processing power and a deterministic environment to work with certainly helps.
But what if your problem requires some other type of performance profile? For example, what if it involves passing around a lot of small pieces of data very fast? I
2
O shines when the 32K chunks are used, as they were in the above demonstration: but what if you're passing around a lot of 100-byte packets of data?
In this case, performance becomes much fuzzier. If the data can be streamed so that no latency is involved in
waiting for a response, there could still be significant improvement in throughput. The theory of offloading the central CPU is only good, though, if there is something else useful for that CPU to do. If this isn't the case, then the overhead involved in passing the data to the IOP could quickly overshadow the benefit of splitting up the task.
As usual, the embedded world is a difficult one to characterize. Embedded devices that support standard office-type PC applications like file servers will probably show
significant improvement with an I
2
O architecture. Use the same architecture with a dedicated embedded PC-based device where the CPU isn't going to be called upon to multitask user applications, and things become less clear. It's up to the engineer designing the solution to determine whether a particular option is appropriate.
A Winning Solution
I
2
O is by nature an improvement that operates behind the scenes of the computer world. And make no mistake, it is an
improvement. There are specific instances in which an I
2
O implementation will make a significant improvement in the throughput of a system, most notably in the high-end server arena. And it certainly does make life easier for peripheral vendors, minimizing their non-recurring engineering costs and maximizing their time-to-market. Based just on these points, I
2
O should be a real winner. Add in system independence and flexibility of use in those "odd" embedded systems, and I
2
O
becomes a significant improvement in the way we make computer systems.
Larry Mittag, contributing editor for ESP, can be reached at larry@stellcom.com.
|
|
|
Return to Embedded.com
Send comments to:
Webmaster
All material on this site Copyright © 2000
CMP
Media Inc. All rights reserved.
|
|
|
|
|