CMP EMBEDDED.COM

Login | Register     Welcome Guest ESC Boston  esc india  Call for Abstracts
 




Embedded Communication Network Pitfalls

by Alexander Dean

Should you build or buy a communications protocol to support your application? Making this decision without understanding network/system interacting will limit optimization and expansion options. Here are some key issues to consider, with CAN, LonTalk, and IEEE-1394 (FireWire) as examples.

Even though embedded network vendors provide a flood of apparently useful information about their products, crucial issues may not become apparent until later in the design cycle. You can't expect a vendor to understand the communication characteristics of your specific application. Performance numbers and charts often represent best-case performance with some unrealistic assumptions. To help the designer ask the important questions up front, this article identifies key issues that can make or break a communication system. We use CAN, LonTalk, and IEEE-1394 (FireWire) to explain the key points. After an introduction to these protocols, we discuss critical issues such as cost, message delivery time predictability, quoted bit rate vs. real performance, message serialization, and network reliability.

The heart of a network protocol is the medium access control (MAC) mechanism, which determines how the network will be shared among the nodes. The MAC's determinism, efficiency, prioritization, and protocol optimizations will determine bus traffic and system performance. Some applications will not operate consistently if an inappropriate MAC is used.

Going one step beyond the MAC reveals digital communication networks to be complex systems, making them difficult to design, optimize, and verify. A typical specification for a new bus, such as IEEE-1394 or Universal Serial Bus, requires hundreds of pages. The interactions between network and application components complicate a visualization of system behavior for important non-obvious cases. This is essentially the software crisis addressed by Frederick P. Brooks, Jr. in The Mythical Man-Monthýwithout structured design, no person can expect to understand a complex system's behavior.1 The designers of a commercial off-the-shelf (COTS) solution typically optimize to a generic network application, which means that design tradeoffs will limit some aspect of performance. Running a COTS network protocol near its performance limits will accentuate problems because of limited visibility into the objectsýwhich become black-boxes, often without completely specified behavior. In addition to this drawback, many COTS products limit design flexibility.

Often a network's throughput is much lower than the bus bit rate because designers optimized other characteristics, creating performance bottlenecks. The designers of some commercial protocols optimize their products to specific applications. For example, CAN is tailored to provide reliable deterministic communications for automotive networks. IEEE-1394 includes isochronous data communications to simplify real-time data transmission by eliminating the need for large transmit and receive buffers. On the other hand, Echelon's LonTalk was designed to support simple network development for a broad variety of applications; it was optimized for flexibility (by defining far more than the MAC) and integration into a microcontroller. Each approach has its place. A problem arises when using a network near the edge of its performance envelope; the optimizations used can create bottlenecks which limit system performance. We examine some bottlenecks in CAN, LonTalk, and IEEE-1394 and potential solutions.

PROTOCOL OVERVIEW

We begin by discussing these three network protocols, LonTalk, CAN, and IEEE-1394, to give a frame of reference when examining network quirks and limitations. While many communication protocols exist, few have spread very far across the diverse embedded systems marketplace.2 For moderate speeds, LonTalk and CAN are de facto standards. For high-speed real-time applications, IEEE-1394 (also referred to as FireWire or High-Speed Serial Bus) may become one of the key players. These networks vary significantly in design goals, resulting in differences of speed, efficiency, development support, and features. Figure 1 shows how these three protocols span the network design space.


Figure 1

LonTalk

The Echelon Corp. developed LonWorks as a complete, full-featured solution for generic control networks. The Neuron communication controller implements many services needed for networked applications. LonTalk, Echelon's communications protocol stack, supports a hierarchy of buses, a variety of message transmission methods, encryption, and authentication services, which simplify network design and installation. At the physical layer, LonTalk can run on twisted pair wiring, powerlines, and radio. Echelon offers a variety of network interface transceivers and a useful development system. Third-party vendors provide a variety of LonWorks-related products and consulting services.

As mentioned earlier, the medium access control (MAC) protocol is the heart of a network. The MAC protocol will determine available bandwidth, message delivery times (through priorities), and robustness.2 LonTalk uses a MAC similar to that in Ethernet, which is commonly used to connect personal computers. LonTalk's MAC is a form of carrier sense multiple access with collision avoidance (CSMA/CA) and is shown in Figure 2. In this MAC, a node ready to transmit on a bus waits for it to become idle before sending a message. If multiple nodes begin transmitting nearly simultaneously, the messages will collide. If the bus is not idle or a collision occurs, the transmitting node backs off and retries transmission after waiting for a random number of time slots. This process continues until a message is successfully transmitted.

Figure 2

Collisions can be avoided by increasing the number of random slots which reduces the probability of collisions. LonTalk's collision avoidance algorithm predicts network traffic and adjusts the number of collision avoidance slots. If a node predicts the bus traffic will increase, it increases the number of slots from which it chooses. This algorithm reduces collisions but does not eliminate the chance of two stations selecting the same time slot. An optional collision-detection circuit aborts communication early when collisions are detected, which improves network performance.

LonTalk Pitfalls

The drawbacks to LonTalk include the following: CSMA/CA is non-deterministic. Probabilistic, collision-based protocols aren't appropriate for systems with tight timing deadlines. The randomness of the protocol makes it impossible to calculate message delivery time bounds.

During heavy traffic conditions, multiple transmitters will be ready to send, increasing the likelihood of collisions. Each collision increases the number of ready transmitters, further congesting the network, so recovery may be very slow. These protocols are most efficient if the bus traffic is low and nodes are not synchronized.

LonTalk comes standard with many services implemented as "black boxes," making it difficult to understand the behavior of the protocol chip. Neurons can not send messages as quickly as one would expect, despite the special 8-bit microcontroller used. The message processing is implemented in software, so each node needs significant time to finish processing a packet before the next arrives.

As shown in Figure 3, the Neuron becomes a bottleneck as network speeds rise above 150Kbps. Although the bus can handle about 600 messages per second, each node is much slower. A single Neuron on a 1.25Mbit/s network can generate at most 326 unacknowledged messages per second.3 In fact, the discrepancy is much worse, as the first rate is for long messages (64 data bytes) while the second is for short messages (one data byte). Increasing system reliability by adding acknowledgments cuts throughput from 326 to 69 messages per second. These limits make LonTalk unsuitable for many control applications.


Figure 3

Controller Area Network

Robert Bosch GmbH designed the CAN protocol for use in automotive control networks.4 Both Mercedes and BMW use CAN in their luxury automobiles; other manufacturers use similar protocols. CAN offers fast, deterministic, prioritized performance with short messages and extensive error detection. With its low-cost components and built-in fault detection capability, this protocol is being applied to a wide variety of non-automotive applications. Adoption of this protocol by Allen-Bradley and Honeywell for their industrial control device networks has helped CAN gain world-wide acceptance. One reason for CAN's success is its simplicity; CAN controllers can be thought of as advanced UARTs offering a basic set of efficient services. The system designer is free to design additional services to meet the application needs, optimizing as needed.

CAN uses the binary countdown method to provide deterministic prioritized medium access. The medium has two states; the dominant state wins out over the recessive. All nodes wait for the medium to become idle before transmitting a message. Each message begins with an arbitration field made of a unique message identifier. During the transmission of this identifier, each transmitting node compares the bus state with what it's attempting to send. If at any bit position the node detects a dominant bit while attempting to send a recessive bit, the node loses arbitration and aborts transmission. Therefore a node with a smallest identifier value wins the bus arbitration (a dominant bit is represented by a logical 0). Figure 4 shows an example of two nodes contending for the bus. Node 5 drops out during the third bit, after receiving a dominant signal while sending a recessive signal.


Figure 4

This medium access method is very efficient because no bandwidth is lost during arbitration. Bus throughput is high under both light and heavy traffic conditions, reaching 1,000 msgs/s at 125Kbps and 8,000 msgs/s at 1Mbps. CAN provides five error detection mechanisms, including a 15-bit cyclic redundancy check (CRC) code that detects nearly all potential message bit errors.5

CAN Pitfalls

The CAN protocol has its own limitations. Because CAN nodes must listen to the bus while transmitting, the bit length must be at least twice the propagation delay. Therefore, high speeds are only supported for short buses (500m for 125 Kbps, 100m for 500Kbps, and 50m for 1Mbps).

Some applications require electrical isolation between the bus and nodes. Transformer coupling, a common approach, requires special care for bit-dominance protocols. Instead, optical isolation is used, requiring separate network power or a DC-to-DC converter. This isolation support hardware makes the interface more expensive.

CAN specifies only a basic set of network services. Additional services can be costly and tricky to implement. For example, a fragmentation algorithm is required to send messages longer than eight bytes. This algorithm must break long messages into packets with eight-byte data fragments and reconstruct them at the receiver.

IEEE-1394 (FireWire)

Apple Computer developed FireWire, which an IEEE committee adapted and standardized as IEEE-1394 ("A High Performance Serial Backplane Bus"). IEEE-1394 primarily targets the personal computer peripheral communication market with extension into consumer video and audio electronics. To support both regular and multimedia applications, it provides asynchronous and isochronous (guaranteed real-time) data communications. Figure 1 shows how the IEEE-1394 specification, like the CAN specification, defines only a few layers of protocol stack, leaving cost-sensitive optimization issues for system designers. However, the protocol also offers several types of multiple-packet transactions, two different physical layers, and configuration functions, so the specification requires a few hundred pages.

Figure 5 shows the cable and backplane physical layers of IEEE-1394. They vary in topology, bus access methods, speed, and configuration. The backplane is an electrically shared bus which uses a bitwise arbitration scheme for medium access. Bus speeds range from 12.5Mbit/s to 50Mbit/s. No configuration is needed at start-up. In the cable version, nodes are connected in a tree topology. Although the specification currently limits each cable to 4.5 meters, this can be relaxed.6 The tree is not a bus electrically; instead, it is a set of point-to-point interconnecting links. This point-to-point topology relaxes high-speed signal requirements (communications can run from 100 to 400Mbit/s) and offers hot-plugging of devices. The nodes use a deterministic hierarchical scheme for medium access.


Figure 5

Texas Instruments, Adaptec, Symbios, and Sony offer link layer controllers which use a Peripheral Component Interconnect (PCI) interface. Because many products with IEEE-1394 interfaces will use ASICs, manufacturers are integrating the link layer controller into their custom chip. Apple, Innovative Semiconductors, Macro Designs, and SICAN offer link layer controller designs for licensing. Several vendors now offer development systems to support designers. Additional information can be obtained from the 1394 trade association web page.7

FireWire Pitfalls

IEEE-1394 has its limitations too: the extremely high bandwidth of IEEE-1394 complicates network interface design. Nodes must be able to handle incoming data very quickly. Most protocol controllers use PCI, which may limit the spread of FireWire to high-performance embedded applications.

The protocol is new, so support is limited and there may be unknown shortcomings yet to be identified and solved. In addition, protocol interface chips may have bugs.

Protocol Summary

Table 1 presents performance and cost characteristics of the three protocols. Each has its specific strengths and weaknesses.

Protocol Max. Bit Rate (Mbit/s) Max. Data Payload (bytes) Max. Msgs/s MAC Efficiency Chip Cost Development Support
CAN 1 8 8,000 High $6 Substantial
LonTalk 1.25 229 560 Moderate $5 Substantial
IEEE-1394 400 varies, 2,048 typically 800,000 High $30 Moderate

Table 1

The LonTalk protocol offers non-deterministic communication at moderate speeds. The protocol provides a complete set of communication services, thus eliminating the need to reinvent any wheels when designing an application. Echelon sells a variety of development support tools that simplify system prototyping. The disadvantage of the integrated, complete solution is that the protocol or the Neuron processor may be too slow for your application.

CAN offers fast, deterministic, prioritized performance with short messages and extensive error detection. Because CAN's specification is limited, it allows significant optimization by the system designer. Many companies offer CAN components and development tools. CAN device prices are low and will fall further as the automotive CAN market grows.

IEEE-1394 provides extremely fast, deterministic network performance. The protocol is targeted for high-volume applications so silicon prices should fall quickly as popularity increases. The extreme speed forces the use of very fast support chips or careful message scheduling. The protocol is so new that the development support is limited.

LESSONS LEARNED - SURPRISES LEFT OUT OF THE ADVERTISEMENTS
LonTalk, CAN, and FireWire reveal some risks involved with network design, including performance limitations and networks mismatched to applications. These problems can occur in any protocol.

Message Delivery Behavior: A real-time system may fail if any message is delivered after its deadline. Soft real-time systems are able to compensate for occasional message delays or losses. The distinctions among hard real time, soft real time, and non-real time are admittedly ambiguous, given their dependence on the application's nature and robustness. Non-real-time systems do not have specific message deadlines beyond which data is useless. A flight control computer for an inherently unstable aircraft has tighter real-time requirements than a CD player controller; both are tighter than toaster controller's requirements. The system designer should understand the consequences of lost or late messages. This will drive the selection of the MAC.

Some MACs cannot guarantee reliable on-time message delivery for messages, even when operating with sufficient bandwidth and without hardware failures or bus noise. This is true of both non-deterministic and deterministic MACs. However, a polled MAC such as token-passing or TDMA does provide delivery guarantees under these conditions. A probabilistic MAC (as used by Ethernet and LonTalk) is not appropriate for a hard real-time system, as there is a chance a message will not be delivered on time. This chance means that no amount of prototyping and testing can prove proper system operation in the future.

Three fundamental factors which influence message delays and losses are:

MAC Protocol : The medium access protocol plays a significant role in defining message loss and delay characteristics. Probabilistic collision-based protocols (like LonTalk) resolve contention by waiting a random length of time, resulting in message delays. If an acknowledgment scheme is not used, messages can be lost during these collisions. For collision-free protocols (for example, CAN, IEEE-1394), the situation is better. However, if a protocol is priority-based (like CAN), low-priority messages can experience long delays.

Bursty Traffic: Bursts of traffic on a network induce delays as they cause bus congestion. Many events and attributes can lead to message bursts, including noise, periodic messages, command/response messages, fragmentation of excessively long messages, and activity based on external stimuli. Even a network with low average utilization may become congested more often than would be expected.8 In collision-based protocols, message bursts increase the likelihood of multiple collisions and large delays. In collision-free protocols, burstiness places high demands on the network interface as it must transfer back-to-back messages into the main memory to avoid buffer overflow.

Bandwidth: Running a bus near its maximum throughput makes the network more sensitive to traffic bursts, as there is less slack time to handle the extra traffic. For example, a 95% loaded bus will take ten times as long to recover from a burst as one which is 50% loaded. During this recovery period, the bus will be fully loaded and will suffer from the problems mentioned above. To avoid communication congestion and delays in our applications, we design networks to support five times the expected traffic. For collision-based networks we double this margin. These margins will also accommodate future growth. The system designer should understand the communication requirements of the application and choose a MAC protocol that complements them.

The COTS Black-Box "Solution": It is usually much easier, faster, and more cost-effective to buy a commercial off-the-shelf solution than design it. Communication protocols are hard to design correctly. However, COTS "black-box" solutions have some risks: For applications that are sensitive to factors such as cost, weight, size, power, and expansion capability, a generalized COTS solution might be a poor fit. A generic protocol will have many features that increase product costs and reduce performance. Conversely, a protocol optimized for a specific application may not be suited for a different application. For example, a LonTalk controller has many built-in services which the application may never use. Every Neuron carries code to support the unused services, increasing cost. Every LonWorks network transmits extra bits to support unused features, wasting bandwidth.

A black-box approach hides implementation and behavioral information which affects larger system design. It is difficult to efficiently specify a complex module's behavior under all conditions. However, to design a system which operates correctly during critical times, you must know the component interactions. Software can be a special problem because high quality requires significant planning and testing. If possible, obtain the specification the engineers used to create the system; this will explain behavior in special cases missing from the user's manual.

The black-box method can constrain design and development flexibility by locking the designer into a limited suite of products and tools. For example, until recently Neuron C was the only programming option for LonTalk; assembly language simply wasn't available. In addition, no upgrade path was available beyond the Neuron for applications that needed faster execution and response times. However, Echelon recently opened the LonTalk protocol for use on other processors. Verify that the COTS solution meets all of the application's requirements. You may require access to design documents through a non-disclosure agreement with the vendor. Find the system's performance limits and determine whether they can be relaxed.

Ideal and Real Throughput: The MAC protocol and network interface can cut throughput to a fraction of the raw bus bit rate. Figure 6 shows a plot of data throughput varying with the bus bit rate, with various bottlenecks superimposed. The shaded area represents the performance envelope within which the system can operate. Notice how each element in the communication system can limit performance. The diagonals are bottlenecks related to protocol efficiency, while the horizontal lines come from message processing rates. Moving within the design space leads to different bottlenecks constraining performance.
Figure 6

Protocol Bottlenecks

Collisions and retries waste bandwidth. Collision detection circuitry reduces this loss, but its extra components increase system costs. Ethernet and LonTalk are two common protocols which suffer from this bandwidth penalty. CAN and FireWire use a MAC with lossless collision resolution and do not waste bandwidth this way.

Each message packet contains both data and protocol support information. For example, CAN requires eight bytes and IEEE-1394 uses at least 20 bytes of support information. This information helps implement network services such as addressing, routing, error detection, and bit synchronization. If each packet carries a significant amount of support information compared to data, the throughput will suffer. Sending packets with little data exacerbates this limitation.

Message Processing Bottlenecks

The network interface design is critical in networks with a data throughput mismatch between the bus and network interface. Both LonTalk (at 1.25Mbit/s) and IEEE-1394 (at any speed) have this imbalance. The microcontroller must move data from the incoming message queue into main memory fast enough to keep that queue from overflowing.

An IEEE-1394 network running at 100Mbit/s is fast enough to require hardware support to get full throughput. The 100 Mbit/sec rate translates to a 32-bit quadlet every 320ns, which is faster than most embedded microcontrollers can handle. The designers of the protocol and interface chips attempted to alleviate the node bottleneck in several ways.

The protocol deals with 32-bit quadlets rather than 8-bit bytes. The 32-bit bus at the link layer controller quadruples the bandwidth available. The protocol features isochronous as well as asynchronous communications. Asynchronous messages can be sent at nearly any time, but isochronous messages are only sent every 125ms, immediately after a periodic synchronization signal. The link layer controller uses queues to compensate for moderate speed differences between the bus and the processor.

Bottleneck Avoidance

To avoid communication problems, you should identify and eliminate problem bottlenecks. If the bus is the bottleneck, increase the bit rate or send fewer messages. Consider using event-driven rather than periodic messages. If the network interface is the problem, the following actions can avoid bottlenecks:

  • Use a faster microcontroller to implement the protocol stack and handle messages. This can be an expensive option, as faster processor support circuitry may be needed.
  • Implement the function in fast hardware. A microcontroller's direct memory access (DMA) controller is a convenient solution.
  • Throttle back the network by including a mandatory idle time between packet transmissions. Schedule the message traffic to limit activity at each node. This may be the least expensive solution, but it requires synchronization among the nodes which may not be feasible for a given application.
  • Rely on message retransmission. When the receive buffer is full, transmit a negative acknowledgment to force message retransmission.
build or buy?

Selecting a communication protocol to support an application requires an understanding of both the protocol and the application. Whether generic or application-specific, a commercial protocol will probably limit system optimization. This build-or-buy decision has repercussions in embedded network design because of the need to understand, optimize, and expand the system.

A hard real-time application may have short delivery deadlines which must not be missed. Non-deterministic networks are not suited for this type of system, as they cannot guarantee message delivery times. Even deterministic prioritized networks may not be appropriate, due to node starvation. Some networks suffer dramatically during bursts of traffic; these traffic bursts tend to be more common than initially expected.

Some applications have commercial protocols that are optimized specifically for them, but these applications are exceptions to the rule. For the other applications, a system designer must either develop a completely new protocol, apply a general-purpose protocol, or adapt an existing protocol to a new application. Creating a new protocol requires a significant amount of time and resources. Applying a general-purpose protocol and adapting an application-specific protocol both limit the amount of optimization possible. This optimization may be crucial in applications with tight performance, cost, size, weight, and environmental constraints.

Choosing an implementation of the protocol leads to the need to understand that implementation. COTS solutions are not fully described because of the complexity and the need to maintain a competitive advantage. This hiding of information complicates the job of the system designer, who must understand the interactions of all system components including the network implementation.

Protocol and implementation details lead to performance bottlenecks, typically reducing communication system throughput to far below the raw bus bit rate. A speed mismatch between the network and network interface will usually lead to bottlenecks, but techniques are available to address and lessen their effects.

Alexander Dean is an Assistant Research Engineer at UTRC. He has designed and analyzed embedded networks and computer architectures for Otis elevators, Pratt&Whitney jet engines, automotive systems, and wireless building systems. He received a BSEE from the University of Wisconsin, and an MSEE from Carnegie Mellon University, where he is currently pursuing a PhD in Electrical and Computer Engineering. He is researching compilation techniques to eliminate real-time support hardware. He can be contacted at adean@ece.cmu.edu.

Bhargav Upender is a Research Engineer at United Technologies Research Center. He currently designs and evaluates network protocols and software architectures for distributed embedded systems. He holds a BS in electrical engineering from University of Connecticut and MS in electrical engineering from Cornell University. He can be contacted electronically at barg@utrc.utc.com .

REFERENCES
  • 1. Brooks, F. The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1972.
  • 2. Upender, B., and P. Koopman, "Communication Protocols For Embedded Systems," Embedded Systems Programming, November 1994, p. 46.
  • 3. "LonWorks Engineering Bulletin: LonTalk Response Time Measurements," Palo Alto, CA: Echelon Corp., 1992.
  • 4. "RoadVehiclesýInterchange of Digital InformationýController Area Network for High-Speed Communication,"International Standard Organization, ISO-11898, November 1993.
  • 5. Unruh, J., H. J. Mathony, and K. H.Kaiser, "Error Detection Analysis of Automotive Communication Protocols," SAE Paper 900699, 1990.
  • 6.Carter, A., "Longer Cables for the IEEE-P1394 High Performance Serial Bus," Cupertino, CA: Apple Computer, 1994.
  • 7. The 1394 Trade Association, The Multimedia Connection, http://www.394ta.org/index.html.
  • 8. Upender, B. and A. Dean, "Variability of CAN Network Performance," ThirdInternational CAN Conference, Paris, France, 1996.
Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS


 :