Embedded Communication Network Pitfalls
by Alexander Dean
Should you build or buy a communications protocol to
support your application?
Making this decision without understanding network/system interacting will limit optimization and expansion options.
Here are some key issues to consider, with CAN, LonTalk, and IEEE-1394 (FireWire) as examples.
Even though embedded network vendors provide a flood of apparently useful information about their
products, crucial issues may not become apparent until later in the design cycle. You can't expect a vendor
to understand the communication characteristics of your
specific application. Performance numbers and
charts often represent best-case performance with some unrealistic assumptions. To help the designer ask
the important questions up front, this article identifies key issues that can make or break a communication
system. We use CAN, LonTalk, and IEEE-1394 (FireWire) to explain the key points. After an introduction to
these protocols, we discuss critical issues such as cost, message delivery time predictability, quoted bit
rate vs. real performance, message
serialization, and network reliability.
The heart of a network protocol is the medium access control (MAC) mechanism, which determines how the network will be shared among
the nodes. The MAC's determinism, efficiency, prioritization, and protocol optimizations will determine bus
traffic and system performance. Some applications will not operate consistently if an inappropriate MAC is
used.
Going one step beyond the MAC reveals digital communication networks to be complex systems,
making them
difficult to design, optimize, and verify.
A typical specification for a new bus, such as
IEEE-1394 or Universal Serial Bus, requires hundreds of pages. The interactions between network and
application components complicate a visualization of system behavior for important non-obvious cases. This
is essentially the software crisis addressed by Frederick P. Brooks, Jr. in The Mythical
Man-Monthýwithout structured design, no person can expect to understand a complex system's behavior.1
The designers of a
commercial off-the-shelf (COTS) solution typically optimize to a generic network
application, which means that design tradeoffs will limit some aspect of performance. Running a COTS
network protocol near its performance limits will accentuate problems because of limited visibility into the
objectsýwhich become black-boxes, often without completely specified behavior. In addition to this
drawback, many COTS products limit design flexibility.
Often a network's throughput is much lower than
the bus bit
rate because designers optimized other characteristics, creating performance bottlenecks. The
designers of some commercial protocols optimize their products to specific applications. For example, CAN
is tailored to provide reliable deterministic communications for automotive networks. IEEE-1394 includes
isochronous data communications to simplify real-time data transmission by eliminating the need for large
transmit and receive buffers. On the other hand, Echelon's LonTalk was designed to support simple
network
development for a broad variety of applications; it was optimized for flexibility (by defining far more than
the MAC) and integration into a microcontroller. Each approach has its place. A problem arises when using a
network near the edge of its performance envelope; the optimizations used can create bottlenecks which
limit system performance. We examine some bottlenecks in CAN, LonTalk, and IEEE-1394 and potential
solutions.
PROTOCOL OVERVIEW
We begin by discussing these three network
protocols, LonTalk, CAN, and
IEEE-1394, to give a frame of reference when examining network quirks and limitations. While many
communication protocols exist, few have spread very far across the diverse embedded systems
marketplace.2 For moderate speeds, LonTalk and CAN are de facto standards. For high-speed real-time
applications, IEEE-1394 (also referred to as FireWire or High-Speed Serial Bus) may become one of the key
players. These networks vary significantly in design goals, resulting in differences of
speed, efficiency,
development support, and features. Figure 1 shows how these three protocols span the network design
space.

Figure 1
LonTalk
The Echelon Corp. developed LonWorks as a complete, full-featured solution for generic
control networks. The Neuron communication controller implements many services needed for networked
applications. LonTalk, Echelon's communications protocol stack, supports a hierarchy of
buses, a variety of
message transmission methods, encryption, and authentication services, which simplify network design and
installation. At the physical layer, LonTalk can run on twisted pair wiring, powerlines, and radio. Echelon
offers a variety of network interface transceivers and a useful development system. Third-party vendors
provide a variety of LonWorks-related products and consulting services.
As mentioned earlier, the medium
access control (MAC) protocol is the heart of a network. The MAC
protocol will determine available
bandwidth, message delivery times (through priorities), and robustness.2 LonTalk uses a MAC similar to
that in Ethernet, which is commonly used to connect personal computers. LonTalk's MAC is a form of
carrier sense multiple access with collision avoidance (CSMA/CA) and is shown in Figure 2. In this MAC, a
node ready to transmit on a bus waits for it to become idle before sending a message. If multiple nodes begin
transmitting nearly simultaneously, the messages will
collide. If the bus is not idle or a collision occurs, the
transmitting node backs off and retries transmission after waiting for a random number of time slots. This
process continues until a message is successfully transmitted.

Figure 2
Collisions can be avoided by increasing the
number of random slots which reduces the probability of collisions. LonTalk's collision avoidance algorithm
predicts network traffic and adjusts the
number of collision avoidance slots. If a node predicts the bus
traffic will increase, it increases the number of slots from which it chooses. This algorithm reduces
collisions but does not eliminate the chance of two stations selecting the same time slot. An optional
collision-detection circuit aborts communication early when collisions are detected, which improves
network performance.
LonTalk Pitfalls
The drawbacks to LonTalk include the following: CSMA/CA is
non-deterministic.
Probabilistic, collision-based protocols aren't appropriate for systems with tight timing
deadlines. The randomness of the protocol makes it impossible to calculate message delivery time bounds.
During heavy traffic conditions, multiple transmitters will be ready to send, increasing the likelihood of
collisions. Each collision increases the number of ready transmitters, further congesting the network, so
recovery may be very slow. These protocols are most efficient if the bus traffic is low and nodes are not
synchronized.
LonTalk comes standard with many services implemented as "black boxes," making it
difficult to understand the behavior of the protocol chip. Neurons can not send messages as quickly as one
would expect, despite the special 8-bit microcontroller used. The message processing is implemented in
software, so each node needs significant time to finish processing a packet before the next arrives.
As
shown in Figure 3, the Neuron becomes a bottleneck as network speeds rise above
150Kbps. Although the
bus can handle about 600 messages per second, each node is much slower. A single Neuron on a 1.25Mbit/s
network can generate at most 326 unacknowledged messages per second.3 In fact, the discrepancy is much
worse, as the first rate is for long messages (64 data bytes) while the second is for short messages (one
data byte). Increasing system reliability by adding acknowledgments cuts throughput from 326 to 69
messages per second. These limits make LonTalk unsuitable for many control
applications.

Figure 3
Controller Area
Network
Robert Bosch GmbH designed the CAN protocol for use in automotive control networks.4 Both
Mercedes and BMW use CAN in their luxury automobiles; other manufacturers use similar protocols. CAN
offers fast, deterministic, prioritized performance with short messages and extensive error detection.
With its low-cost components and built-in fault detection capability, this
protocol is being applied to a wide
variety of non-automotive applications. Adoption of this protocol by Allen-Bradley and Honeywell for their
industrial control device networks has helped CAN gain world-wide acceptance. One reason for CAN's
success is its simplicity; CAN controllers can be thought of as advanced UARTs offering a basic set of
efficient services. The system designer is free to design additional services to meet the application needs,
optimizing as needed.
CAN uses the binary
countdown method to provide deterministic prioritized medium
access. The medium has two states; the dominant state wins out over the recessive. All nodes wait for the
medium to become idle before transmitting a message. Each message begins with an arbitration field made
of a unique message identifier. During the transmission of this identifier, each transmitting node compares
the bus state with what it's attempting to send. If at any bit position the node detects a dominant bit while
attempting to send a recessive
bit, the node loses arbitration and aborts transmission. Therefore a node
with a smallest identifier value wins the bus arbitration (a dominant bit is represented by a logical 0).
Figure 4 shows an example of two nodes contending for the bus. Node 5 drops out during the third bit, after
receiving a dominant signal while sending a recessive signal.

Figure 4
This medium access method is very efficient
because no bandwidth
is lost during arbitration. Bus throughput is high under both light and heavy traffic
conditions, reaching 1,000 msgs/s at 125Kbps and 8,000 msgs/s at 1Mbps. CAN provides five error
detection mechanisms, including a 15-bit cyclic redundancy check (CRC) code that detects nearly all
potential message bit errors.5
CAN Pitfalls
The CAN protocol has its own limitations. Because CAN nodes
must listen to the bus while transmitting, the bit length must be at least twice the propagation delay.
Therefore, high speeds are only supported for short buses (500m for 125 Kbps, 100m for 500Kbps, and
50m for 1Mbps).
Some applications require electrical isolation between the bus and nodes. Transformer
coupling, a common approach, requires special care for bit-dominance protocols. Instead, optical isolation is
used, requiring separate network power or a DC-to-DC converter. This isolation support hardware makes
the interface more expensive.
CAN specifies only a basic set of network services.
Additional services can
be costly and tricky to implement. For example, a fragmentation algorithm is required to send messages
longer than eight bytes. This algorithm must break long messages into packets with eight-byte data
fragments and reconstruct them at the receiver.
IEEE-1394 (FireWire)
Apple Computer developed FireWire,
which an IEEE committee adapted and standardized as IEEE-1394 ("A High Performance Serial Backplane
Bus"). IEEE-1394 primarily targets the personal computer
peripheral communication market with extension
into consumer video and audio electronics. To support both regular and multimedia applications, it provides
asynchronous and isochronous (guaranteed real-time) data communications. Figure 1 shows how the
IEEE-1394 specification, like the CAN specification, defines only a few layers of protocol stack, leaving
cost-sensitive optimization issues for system designers. However, the protocol also offers several types
of multiple-packet transactions, two different physical
layers, and configuration functions, so the
specification requires a few hundred pages.
Figure 5 shows the cable and backplane physical layers of
IEEE-1394. They vary in topology, bus access methods, speed, and configuration. The backplane is an
electrically shared bus which uses a bitwise arbitration scheme for medium access. Bus speeds range from
12.5Mbit/s to 50Mbit/s. No configuration is needed at start-up. In the cable version, nodes are connected
in a tree topology. Although the
specification currently limits each cable to 4.5 meters, this can be
relaxed.6 The tree is not a bus electrically; instead, it is a set of point-to-point interconnecting links. This
point-to-point topology relaxes high-speed signal requirements (communications can run from 100 to
400Mbit/s) and offers hot-plugging of devices. The nodes use a deterministic hierarchical scheme for
medium access.

Figure 5
Texas Instruments, Adaptec,
Symbios, and Sony offer link layer controllers which use a
Peripheral Component Interconnect (PCI) interface. Because many products with IEEE-1394 interfaces will
use ASICs, manufacturers are integrating the link layer controller into their custom chip. Apple, Innovative
Semiconductors, Macro Designs, and SICAN offer link layer controller designs for licensing. Several
vendors now offer development systems to support designers. Additional information can be obtained from
the 1394 trade association web
page.7
FireWire Pitfalls
IEEE-1394 has its limitations too: the extremely
high bandwidth of IEEE-1394 complicates network interface design. Nodes must be able to handle incoming
data very quickly. Most protocol controllers use PCI, which may limit the spread of FireWire to
high-performance embedded applications.
The protocol is new, so support is limited and there may be
unknown shortcomings yet to be identified and solved. In addition, protocol interface chips may have bugs.
Protocol Summary
Table 1 presents performance and cost characteristics of the three protocols. Each has
its specific strengths and weaknesses.
|
Protocol
|
Max. Bit Rate (Mbit/s)
|
Max. Data Payload (bytes)
|
Max. Msgs/s
|
MAC Efficiency
|
Chip Cost
|
Development Support
|
|
CAN
|
1
|
8
|
8,000
|
High
|
$6
|
Substantial
|
|
LonTalk
|
1.25
|
229
|
560
|
Moderate
|
$5
|
Substantial
|
|
IEEE-1394
|
400
|
varies, 2,048 typically
|
800,000
|
High
|
$30
|
Moderate
|
Table 1
The LonTalk protocol offers non-deterministic communication at
moderate speeds. The protocol provides a complete set of communication services, thus eliminating the
need to reinvent any wheels when designing an application. Echelon sells a variety of
development support
tools that simplify system prototyping. The disadvantage of the integrated, complete solution is that the
protocol or the Neuron processor may be too slow for your application.
CAN offers fast, deterministic,
prioritized performance with short messages and extensive error detection. Because CAN's specification is
limited, it allows significant optimization by the system designer. Many companies offer CAN components
and development tools. CAN device prices are low and will fall further
as the automotive CAN market
grows.
IEEE-1394 provides extremely fast, deterministic network performance. The protocol is targeted
for high-volume applications so silicon prices should fall quickly as popularity increases. The extreme speed
forces the use of very fast support chips or careful message scheduling. The protocol is so new that the
development support is limited.
LESSONS LEARNED - SURPRISES LEFT OUT OF THE ADVERTISEMENTS
LonTalk, CAN, and FireWire reveal some
risks involved with network design, including performance
limitations and networks mismatched to applications. These problems can occur in any protocol.
Message
Delivery Behavior:
A real-time system may fail if any message is delivered after its deadline. Soft
real-time systems are able to compensate for occasional message delays or losses. The distinctions among
hard real time, soft real time, and non-real time are admittedly ambiguous, given their dependence on the
application's nature and
robustness. Non-real-time systems do not have specific message deadlines beyond
which data is useless. A flight control computer for an inherently unstable aircraft has tighter real-time
requirements than a CD player controller; both are tighter than toaster controller's requirements. The
system designer should understand the consequences of lost or late messages. This will drive the selection
of the MAC.
Some MACs cannot guarantee reliable on-time message delivery for messages, even when
operating
with sufficient bandwidth and without hardware failures or bus noise. This is true of both
non-deterministic and deterministic MACs. However, a polled MAC such as token-passing or TDMA does
provide delivery guarantees under these conditions. A probabilistic MAC (as used by Ethernet and LonTalk)
is not appropriate for a hard real-time system, as there is a chance a message will not be delivered on
time. This chance means that no amount of prototyping and testing can prove proper system operation in the
future.
Three fundamental factors which influence message delays and losses are:
MAC Protocol
: The medium access protocol plays a significant role in defining message loss and delay characteristics.
Probabilistic collision-based protocols (like LonTalk) resolve contention by waiting a random length of time,
resulting in message delays. If an acknowledgment scheme is not used, messages can be lost during these
collisions. For collision-free protocols (for example, CAN,
IEEE-1394), the situation is better. However, if
a protocol is priority-based (like CAN), low-priority messages can experience long delays.
Bursty Traffic:
Bursts of traffic on a network induce delays as they cause bus congestion. Many events and attributes can
lead to message bursts, including noise, periodic messages, command/response messages, fragmentation of
excessively long messages, and activity based on external stimuli. Even a network with low average
utilization may become congested more
often than would be expected.8 In collision-based protocols,
message bursts increase the likelihood of multiple collisions and large delays. In collision-free protocols,
burstiness places high demands on the network interface as it must transfer back-to-back messages into
the main memory to avoid buffer overflow.
Bandwidth:
Running a bus near its maximum throughput makes
the network more sensitive to traffic bursts, as there is less slack time to handle the extra traffic. For
example, a 95%
loaded bus will take ten times as long to recover from a burst as one which is 50% loaded.
During this recovery period, the bus will be fully loaded and will suffer from the problems mentioned above.
To avoid communication congestion and delays in our applications, we design networks to support five times
the expected traffic. For collision-based networks we double this margin. These margins will also
accommodate future growth. The system designer should understand the communication requirements of the
application and choose a MAC protocol that complements them.
The COTS Black-Box "Solution":
It is
usually much easier, faster, and more cost-effective to buy a commercial off-the-shelf solution than design
it. Communication protocols are hard to design correctly. However, COTS "black-box" solutions have some
risks: For applications that are sensitive to factors such as cost, weight, size, power, and expansion
capability, a generalized COTS solution might be a poor fit. A generic protocol will
have many features that
increase product costs and reduce performance. Conversely, a protocol optimized for a specific application
may not be suited for a different application. For example, a LonTalk controller has many built-in services
which the application may never use. Every Neuron carries code to support the unused services, increasing
cost. Every LonWorks network transmits extra bits to support unused features, wasting bandwidth.
A
black-box approach hides implementation and behavioral
information which affects larger system design. It
is difficult to efficiently specify a complex module's behavior under all conditions. However, to design a
system which operates correctly during critical times, you must know the component interactions.
Software can be a special problem because high quality requires significant planning and testing. If possible,
obtain the specification the engineers used to create the system; this will explain behavior in special cases
missing from the user's manual.
The black-box method can constrain design and development flexibility by
locking the designer into a limited suite of products and tools. For example, until recently Neuron C was the
only programming option for LonTalk; assembly language simply wasn't available. In addition, no upgrade
path was available beyond the Neuron for applications that needed faster execution and response times.
However, Echelon recently opened the LonTalk protocol for use on other processors. Verify that the COTS
solution meets
all of the application's requirements. You may require access to design documents through a
non-disclosure agreement with the vendor. Find the system's performance limits and determine whether
they can be relaxed.
Ideal and Real Throughput:
The MAC protocol and network interface can cut throughput
to a fraction of the raw bus bit rate. Figure 6 shows a plot of data throughput varying with the bus bit rate,
with various bottlenecks superimposed. The shaded area represents the performance
envelope within which
the system can operate. Notice how each element in the communication system can limit performance. The
diagonals are bottlenecks related to protocol efficiency, while the horizontal lines come from message
processing rates. Moving within the design space leads to different bottlenecks constraining performance.

Figure 6
Protocol Bottlenecks
Collisions and retries waste bandwidth. Collision detection
circuitry reduces this loss,
but its extra components increase system costs. Ethernet and LonTalk are two common protocols which
suffer from this bandwidth penalty. CAN and FireWire use a MAC with lossless collision resolution and do
not waste bandwidth this way.
Each message packet contains both data and protocol support information.
For example, CAN requires eight bytes and IEEE-1394 uses at least 20 bytes of support information. This
information helps implement network services such as addressing,
routing, error detection, and bit
synchronization. If each packet carries a significant amount of support information compared to data, the
throughput will suffer. Sending packets with little data exacerbates this limitation.
Message Processing
Bottlenecks
The network interface design is critical in networks with a data throughput mismatch between
the bus and network interface. Both LonTalk (at 1.25Mbit/s) and IEEE-1394 (at any speed) have this
imbalance. The microcontroller must move
data from the incoming message queue into main memory fast
enough to keep that queue from overflowing.
An IEEE-1394 network running at 100Mbit/s is fast enough to
require hardware support to get full throughput. The 100 Mbit/sec rate translates to a 32-bit quadlet
every 320ns, which is faster than most embedded microcontrollers can handle. The designers of the
protocol and interface chips attempted to alleviate the node bottleneck in several ways.
The protocol deals
with 32-bit quadlets
rather than 8-bit bytes. The 32-bit bus at the link layer controller quadruples the
bandwidth available. The protocol features isochronous as well as asynchronous communications.
Asynchronous messages can be sent at nearly any time, but isochronous messages are only sent every
125ms, immediately after a periodic synchronization signal. The link layer controller uses queues to
compensate for moderate speed differences between the bus and the processor.
Bottleneck Avoidance
To
avoid
communication problems, you should identify and eliminate problem bottlenecks. If the bus is the
bottleneck, increase the bit rate or send fewer messages. Consider using event-driven rather than periodic
messages. If the network interface is the problem, the following actions can avoid bottlenecks:
- Use a faster
microcontroller to implement the protocol stack and handle messages.
This can be an expensive option, as
faster processor support circuitry may be needed.
- Implement the
function in fast hardware. A
microcontroller's direct memory access (DMA) controller is a convenient solution.
- Throttle back the
network by including a mandatory idle time between packet transmissions.
Schedule the message traffic to
limit activity at each node.
This may be the least expensive solution, but it requires synchronization among
the nodes which may not be feasible for a given application.
- Rely on message retransmission. When the
receive buffer is full, transmit a negative
acknowledgment to force message retransmission.
build or buy?
Selecting a communication protocol to support an application requires an understanding of both the protocol
and the application. Whether generic or application-specific, a commercial protocol will probably limit
system optimization. This build-or-buy decision has repercussions in embedded network design because of
the need to understand, optimize, and expand the system.
A hard real-time application may have short
delivery deadlines which must not be missed. Non-deterministic networks are not suited for this type of
system, as they cannot guarantee message delivery times. Even deterministic prioritized networks may not
be appropriate, due to node starvation. Some networks suffer dramatically during bursts of traffic; these
traffic bursts tend to be more common than initially expected.
Some applications have commercial protocols
that are optimized specifically for them, but these applications are exceptions to
the rule. For the other
applications, a system designer must either develop a completely new protocol, apply a general-purpose
protocol, or adapt an existing protocol to a new application. Creating a new protocol requires a significant
amount of time and resources. Applying a general-purpose protocol and adapting an application-specific
protocol both limit the amount of optimization possible. This optimization may be crucial in applications with
tight performance, cost, size, weight, and environmental
constraints.
Choosing an implementation of the
protocol leads to the need to understand that implementation. COTS solutions are not fully described because
of the complexity and the need to maintain a competitive advantage. This hiding of information complicates
the job of the system designer, who must understand the interactions of all system components including
the network implementation.
Protocol and implementation details lead to performance bottlenecks, typically
reducing communication
system throughput to far below the raw bus bit rate. A speed mismatch between
the network and network interface will usually lead to bottlenecks, but techniques are available to address
and lessen their effects.
Alexander Dean
is an Assistant Research Engineer at UTRC. He has designed and
analyzed embedded networks and computer architectures for Otis elevators, Pratt&Whitney jet engines,
automotive systems, and wireless building systems. He received a BSEE from the University of
Wisconsin,
and an MSEE from Carnegie Mellon University, where he is currently pursuing a PhD in Electrical and
Computer Engineering. He is researching compilation techniques to eliminate real-time support hardware. He
can be contacted at adean@ece.cmu.edu.
Bhargav Upender
is a Research Engineer at United Technologies
Research Center. He currently designs and evaluates network protocols and software architectures for
distributed embedded systems. He holds a BS in electrical engineering from University
of Connecticut and
MS in electrical engineering from Cornell University. He can be contacted electronically at
barg@utrc.utc.com
.
REFERENCES
- 1. Brooks, F. The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1972.
- 2. Upender, B., and P. Koopman, "Communication Protocols For Embedded Systems," Embedded Systems Programming, November 1994, p. 46.
- 3. "LonWorks Engineering Bulletin: LonTalk Response Time Measurements," Palo Alto,
CA: Echelon Corp., 1992.
- 4. "RoadVehiclesýInterchange of Digital InformationýController Area Network for High-Speed Communication,"International Standard Organization, ISO-11898, November 1993.
- 5. Unruh, J., H. J. Mathony, and K. H.Kaiser, "Error Detection Analysis of Automotive Communication Protocols," SAE Paper 900699, 1990.
- 6.Carter, A., "Longer Cables for the IEEE-P1394 High Performance Serial Bus," Cupertino, CA: Apple Computer, 1994.
- 7. The 1394 Trade Association, The
Multimedia Connection, http://www.394ta.org/index.html.
- 8. Upender, B. and A. Dean, "Variability of CAN Network Performance," ThirdInternational CAN Conference, Paris, France, 1996.