CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Applying distributed system concepts to embedded multiprocessor designs: Part 1
Varieties of message passing



Embedded.com
After decades of anticipation, multiprocessors are becoming a reality for the majority of software engineers, especially in embedded systems. Symmetric multiprocessors which use a number of identical processors, one operating system and a shared memory are relatively easy to program, because we can use additional processors to execute additional threads.

A different situation arises when we have processors with different instruction sets, different operating systems or without shared memory. In this case, threads running on different processors must use message passing to communicate. This situation is described as “asymmetric multi-processing”. Asymmetric multiprocessors are more difficult to program than symmetric multiprocessors. Software running on an asymmetric multiprocessor must ensure that threads have the data they need when they need it. Results may be affected by the order in which events occur on different processors.

With the advent of asymmetric multiprocessors, the software architect is challenged to design solutions that make effective use of additional processors, but without sacrificing performance and without needing extensive re-engineering when hardware changes. Asymmetric multiprocessors are effectively distributed systems and some of the tools and techniques of distributed systems can help us address these problems.

This is the first of three articles that will explore the application (and mis-application) of distributed systems techniques to asymmetric multiprocessors in embedded systems. In this, the first part, we will describe an example of a software feature that may need to be implemented on multicore. We will discuss its implementation using message passing and how the semantics of that message passing affect the implementation of the feature.

In the next article, we will discuss the value of higher levels of abstraction built on message passing, specifically remote procedure call (RPC) and distributed objects. We will show how they make software development easier rather like C and C++ are easier than assembly language programming.

In the final article, we will explore the differences between traditional distributed systems and where we work, in the depths of an embedded system where TCP/IP and Ethernet are as likely to be features as facilities. We will discuss how these differences can mandate technology choices different from traditional distributed systems.

Route Table Example
I have chosen a forwarding table in an IP router as an example of a software feature to be implemented in an embedded distributed system. I have chosen this example for two reasons. First, the functionality is realistic, but relatively easy to explain and appreciate. Second, it has some of the subtleties that make distributed processing difficult. After explaining this feature, we will discuss its implementation using different distributed processing techniques.

An IP router is a device that has many network ports, each connected to a different network. When it receives an IP packet on a port, it reads its destination IP address and uses it to look up a route table to find its next hop IP address and other information. The next hop IP address is the directly accessible IP address to which the router must send the packet. The next hop address tells implicitly which network and therefore which port the packet is to be sent out on.

The route table is central to the operation of the router. It is a collection of key/value pairs. A key consists of a prefix IP address and a bit mask indicating how many bits of the prefix are significant. The value consists of a next hop IP address and other information needed to process the packet. Route lookup consists of finding a prefix in the route table that matches a packet’s destination IP address. If multiple prefixes match, the prefix with the most significant bits is used. Next hop address lookup is specified in [1] section 5.2.4.3, in the references listed at the end of this article.

Route Table Maintenance
The route table is maintained by a routing protocol, which means that it changes while the router is running. This routing protocol talks to its peers on neighboring routers and alters the routing table piecemeal as a result of its information received. Route table entries are added and deleted on the fly. In C, the API to the route table might look something like Figure 1, below.

Figure 1: Route Table API

The parameter table identifies which route table is referenced. (A router may use more than one route table, for example, when implementing Virtual Private Networks) The parameters prefix and mask indicate the key to be added to or deleted from the route table. The parameter next_hop is the next hop address value for the corresponding key. The return value indicates success or failure.

Simple Router Architecture
The most straightforward way to build a router is to take a general purpose computer like a PC with a processor, a bus and a memory, and then add network cards and software. In this hardware architecture, a network card transfers a packet arriving at an ingress port to a memory buffer using Direct Memory Access (DMA). It then adds it to a software queue for processing and interrupts the processor. Software takes packets from the queue, examines each packet, performs the route lookup, then puts the outbound packet on a queue for transmission by the egress port network card.

Figure 2: Simple Router Architecture

Figure 2 above illustrates this process. Black arrows represent data paths. Red arrows represent a packet arriving, being processed and leaving. The blue arrow represents route table lookup and maintenance by the processor.

High Performance Router Architecture
The simple router architecture doesn’t scale very well. Every packet from every port is written to the same memory, where it is read and written by the same processor and read by the egress port. The bus connecting the processor, the memory and the network cards is a bottleneck.

In modern high performance routers, the bus is replaced by a switch fabric. A switch fabric is an NxN matrix of connections. It has N inputs and N outputs and it can be programmed to connect each output to any input. The advantage of a switch fabric is that it has a much higher aggregate bandwidth than a bus. Whereas a bus can receive or transmit one packet at a time, an NxN switch fabric can transfer up to N packets from ingress to egress simultaneously.

But for this to work, the central processor must be taken out of packet forwarding. Route lookup must be done at the ingress port on each network card. Each network card must have a processor and a replica of the route table. (The port hardware may use a combination of hardware and software to do packet processing. For simplicity, we will just consider software processing).

The route table protocol runs on the central processor and distributes updates to the replicas. This is illustrated in Figure 3, below. An arriving packet (red arrows) arrives at an ingress port. Route table lookup is done on the network card and the packet is sent via the switch fabric to a chosen egress port. The central processor maintains a master route table and notifies changes to the network card processors, so that each has an up to date replica of the master route table.

Figure 3: High performance Router Architecture

Software Complications
The high performance router is a much more challenging environment for the software architect. In the simple router, all the software ran on a single processor, one thread at a time, and all the system state was directly accessible in the same memory.

In the high performance router, most packets aren’t seen by the central processor. There are as many threads running at a time as there are processors and a thread on a given processor may have incomplete or out-of-date information about the state of the system as a whole. Different processors have different, potentially contradictory, views of the system state.

Because the route table has to be replicated at each port, we must arrange for routing protocol packets and other local traffic to be transferred between ports and the central processor. We need to worry about keeping the replicated route tables consistent and about the ordering of route table updates.

For example, when a route table entry is added, the processor on the egress port for that route must be told before the other ports, or the other ports may send it packets that it doesn’t know what to do with. Other router functionality, such as access control and statistics gathering, are also more complex in the high performance router.

A Distributed System
Fortunately, the problem of performing computation on multiple collaborating processors has been studied for many years. Coulouris, Dollimore and Kindberg [1] define a distributed system as “one in which components located at networked computers communicate and coordinate their action only by passing messages”. This description certainly applies to our High Performance Router.

In [3] pp 5-7 in the references at the end of this article, Mullender says that the possibility of failures in communication and processors is important. This may not be an issue in systems where processors share a bus and a power supply or even a chip, but it sometimes is an issue. Fault tolerant routers are built with redundant hardware that can be replaced while the system is running to provide high availability.

Continuing service in the event of processor failure affects many aspects of application design. So there is a lot to be gained by thinking of our High Performance Router as a distributed system. Treating the router as a distributed system, we can coordinate route table updates using messages between processors.

Variations on message passing
Message passing seems at first to be a straightforward proposition. A client constructs a message and sends it to a server.

(The terms client and server are sometimes used to mean slightly different things. Here, we refer to client and server in the context of passing an individual message and getting an individual response. A client sends a message to a server and the server responds to the client. The roles of client and server are transient rather than fixed. It would not be unusual, for example, for a server to send a message to another entity, becoming a client in turn, or to send a message back to the client, in which case roles are temporarily reversed.)

The server reads the message and takes some action that depends on the content of the message. Part of the action may be to send a response back to the client. Physically, the message may be passed though a variety of communication mechanisms, including shared memory, LANs, the Internet and RS232 ports.

But passing messages between processors may not be as simple as it appears. Does the sender wait for a response or run ahead? If it runs ahead, how can it collect a response? If it doesn’t run ahead, what does it do while it’s waiting? What guarantees are given about the ordering of messages? Without a response, how does the client know when messages are processed (as opposed to received)? Are messages even guaranteed to be delivered?

Weak and Strong Messaging Guarantees. Let us consider doing route table maintenance using two different message passing models and explore the consequences. The differences are extreme for the purposes of exposition and real world message passing services fall somewhere between these two extremes and make different tradeoffs between flexibility and ease of use.

In both cases, the client constructs a message in a buffer and calls a ‘send’ function with the buffer as a parameter. The server calls a ‘receive’ function which blocks until an incoming message is available. The ‘receive’ function then returns a buffer containing a copy of the message sent by the client. For route table maintenance, the message passed consists of an ‘add’ or a ‘delete’ command with appropriate parameters. The models are different, though, in the semantics of the service they provide.

Weak Guarantee: Asynchronous, Unreliable, Unordered. The first model provides weak guarantees and is rather like UDP/IP, but weaker. The send function returns as soon as possible, before the message is delivered. It minimizes copying by accessing the buffer after the send function returns. It allows the client thread to ‘fire and forget’ multiple messages, maximizing throughput if message delivery is slow.

On the server side, it delivers a sequence of messages for processing. Client and server are decoupled and both can keep busy without waiting for the other. But the model does not guarantee delivery of the message. Nor does it guarantee to deliver messages in the order that they are sent. In short, it provides asynchronous, unordered and unreliable message delivery. It also returns promptly, but may read the buffer after the send function has returned to complete message transmission.

Figure 4: Route Table Maintenance with Weak Messaging

To maintain a route table using the weak messaging model, an API like Figure 4 above must be implemented on top of the messaging service. Note that this API is in a layer above the messaging service, not the messaging service API itself. It constructs and sends messages, receives and parses responses and manages message buffers.

The messages are received by software on the server that parses messages, calls functions and constructs and sends responses. This layer of software is called middleware and we’ll discuss it in more detail in Part 2.

This API is much more complex than Figure 1 and it needs some explanation. The functions send_route_table_add and send_route_table_delete correspond directly to the functions in Figure 1. They construct and send messages using supplied parameters. They have additional parameters, which_server and context.

The parameter which_server defines the destination of the message - the processor where the route table is located. The parameter context is an opaque pointer to a user-defined context for the message.

This parameter is needed to associate a received response with its corresponding message. It points to whatever information is needed to deal with the response. The function receive_route_table_response retrieves a response to a message if any is available and returns a pointer to the response message and the context pointer that was passed with the message. All three functions return an error code if communications fail, but success of the function does not imply successful message delivery.

Clearly, maintaining a route table using asynchronous messages is more complex than calling the C API. Furthermore, the weak message passing model leaves some issues unresolved.

First, what happens if a message is lost? Correct operation of our router requires that copies of the route table at the ports match the master copy in the controller, so we must ensure that lost messages are re-transmitted. But we must distinguish between loss of a message and loss of a response.

If we send a message but get no response, we don’t know if the message was processed or not. Re-sending the message may result in the operation being done twice or more. In the cases of route table addition and deletion, repeating the operation is harmless, but most functions don’t produce correct results if they are called multiple times. Reliable message delivery eliminates this issue.

Second, what if a message is delivered out of sequence? If two messages are sent, the first to delete an entry with a given key and the second to add an entry with the same key, the affected route table should finish up with the new entry. If the messages are delivered and processed out of order, the entry finishes up deleted instead. So we must add sequence numbers to the messages and ensure they are processed in order.

A weak message passing model has hidden costs that are easy to overlook but hard to avoid. These costs are not obvious during software development and testing because failure conditions, processing delays, message loss and out of order delivery generally happen only when a system is stressed. It is necessary to test the application using a deliberately badly behaved version of the messaging service that drops, shuffles and stalls messages at high rates to make sure that the application won’t fail in the field.

Strong Guarantee: Synchronous, Reliable, Ordered. Now consider a message passing service with stronger guarantees. A synchronous service sends messages reliably in order. The client side ‘call’ function blocks until a response is received and it returns the response to another client-supplied buffer.

The server side ‘receive’ function provides a buffer for the server to place its response. So the client is blocked until the message is processed and a response received. After ‘call’ returns, the client knows that the message has been delivered, processed and answered or that an error has occurred.

Figure 5: Route Table Maintenance with Strong Messaging

Figure 5 above shows a route table maintenance API using such a strong messaging service. As in Figure 4, we have server parameters and the functions return error codes that indicate communication problems. The context pointer, however, is not needed. Nor is the separate function to receive a response.

Because messages to add and delete routes are processed in order, the state of the remote route table is known. If an add or a delete fails, the failure can be handled without maintaining context in a specially allocated data structure and without the complication of multiple outstanding messages.

The price for this simplicity is that the client is blocked until the operation is complete and a distributed system where processors spend a lot of their time idle is not very useful. For the client processor to get useful work done while waiting for a response, we need a multi-threaded operating system.

For most applications, the cost of reliable ordered message delivery and a multi-threaded operating system is well paid for in application simplicity and correctness. The higher throughput of asynchronous messaging can be achieved using multiple threads.

Messaging with weak guarantees is often favored for performance reasons, but is only practical for simple APIs. We need to make sure that like is being compared with like in performance comparisons. If the application is forced to ‘strengthen’ the messaging service and maintain message context, the result may be more expensive, less flexible and poorer in performance than adopting a well designed strong messaging service to start with.

Because a strong message passing model is easier to use and safer than a weak one, synchronous, ordered, reliable message passing with a response indicating completion of processing will be assumed in the remainder of these articles.

In Part 3, we will see that messaging models with strong guarantees provided by hardware are common in embedded systems, but first we will discuss the abstractions we can build on top of message passing and what they do to make distributed processing easier. That is the subject of Part 2.

A PDF version of Part 1: Varieties of message passing is also available.

Dominic Herity is CTO of Redplain Technology (www.redplain.com). Prior to founding Redplain, he served as Technology Leader with Silicon & Software Systems and Task Group Chair in the Network Processing Forum. He has lectured on Embedded Systems at Trinity College Dublin, where he contributed to research into Distributed Operating Systems and High Availability. He has many recent publications on various aspects of Embedded Systems design.

References
1. http://www.ietf.org/rfc/rfc1812.txt
2. “Distributed Systems Concepts and Design”, Coulouris, Dollimore and Kindberg, Addison Wesley, 2001, Third Edition, ISBN 0-201-61918-0
3. “Distributed Systems”, Sape Mullender (Editor), Addison-Wesley, 1989, ISBN 0-201-41660-3
4. “Distributed Operating Systems”, Andrew S. Tannenbaum, Prentice Hall, 1995, ISBN 0-13-143934-0

For more articles on these topics, go to More about multicores, multiprocessing and tools. Other resources and information can be obtained at the Multicore Association Web site.


1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :