Applying distributed system concepts to embedded multiprocessor designs: Part 1 -

Applying distributed system concepts to embedded multiprocessor designs: Part 1


After decades of anticipation, multiprocessors are becoming a realityfor the majority of software engineers, especially in embedded systems.Symmetric multiprocessors which use a number of identical processors,one operating system and a shared memory are relatively easy toprogram, because we can use additional processors to execute additionalthreads.

A different situation arises when we have processors with differentinstruction sets, different operating systems or without shared memory.In this case, threads running on different processors must use messagepassing to communicate. This situation is described as “asymmetricmulti-processing”. Asymmetric multiprocessors are more difficult toprogram than symmetric multiprocessors. Software running on anasymmetric multiprocessor must ensure that threads have the data theyneed when they need it. Results may be affected by the order in whichevents occur on different processors.

With the advent of asymmetric multiprocessors, the softwarearchitect is challenged to design solutions that make effective use ofadditional processors, but without sacrificing performance and withoutneeding extensive re-engineering when hardware changes. Asymmetricmultiprocessors are effectively distributed systems and some of thetools and techniques of distributed systems can help us address theseproblems.

This is the first of three articles that will explore theapplication (and mis-application) of distributed systems techniques toasymmetric multiprocessors in embedded systems. In this, the firstpart, we will describe an example of a software feature that may needto be implemented on multicore. We will discuss its implementationusing message passing and how the semantics of that message passingaffect the implementation of the feature.

In the next article, we will discuss the value of higher levels ofabstraction built on message passing, specifically remote procedurecall (RPC) and distributed objects. We will show how they make softwaredevelopment easier rather like C and C++ are easier than assemblylanguage programming.

In the final article, we will explore the differences betweentraditional distributed systems and where we work, in the depths of anembedded system where TCP/IP and Ethernet are as likely to be featuresas facilities. We will discuss how these differences can mandatetechnology choices different from traditional distributed systems.

Route Table Example
I have chosen a forwarding table in an IP router as an example of asoftware feature to be implemented in an embedded distributed system. Ihave chosen this example for two reasons. First, the functionality isrealistic, but relatively easy to explain and appreciate. Second, ithas some of the subtleties that make distributed processing difficult.After explaining this feature, we will discuss its implementation usingdifferent distributed processing techniques.

An IP router is a device that has many network ports, each connectedto a different network. When it receives an IP packet on a port, itreads its destination IP address and uses it to look up a route tableto find its next hop IP address and other information. The next hop IPaddress is the directly accessible IP address to which the router mustsend the packet. The next hop address tells implicitly which networkand therefore which port the packet is to be sent out on.

The route table is central to the operation of the router. It is acollection of key/value pairs. A key consists of a prefix IP addressand a bit mask indicating how many bits of the prefix are significant.The value consists of a next hop IP address and other informationneeded to process the packet. Route lookup consists of finding a prefixin the route table that matches a packet’s destination IP address. Ifmultiple prefixes match, the prefix with the most significant bits isused. Next hop address lookup is specified in [1] section, inthe references listed at the end of this article.

Route Table Maintenance
The route table is maintained by a routing protocol, which means thatit changes while the router is running. This routing protocol talks toits peers on neighboring routers and alters the routing table piecemealas a result of its information received. Route table entries are addedand deleted on the fly.In C, the API to the route table might look something like Figure 1,below .

Figure1: Route Table API

The parameter table identifies which route table is referenced. (Arouter may use more than one route table, for example, whenimplementing Virtual Private Networks ) The parameters prefix and mask indicate the key to be added to or deleted from theroute table. The parameter next_hop is the next hop addressvalue for the corresponding key. The return value indicates success orfailure.

Simple Router Architecture
The most straightforward way to build a router is to take a generalpurpose computer like a PC with a processor, a bus and a memory, andthen add network cards and software. In this hardware architecture, anetwork card transfers a packet arriving at an ingress port to a memorybuffer using Direct Memory Access (DMA). It then adds it to a softwarequeue for processing and interrupts the processor. Software takespackets from the queue, examines each packet, performs the routelookup, then puts the outbound packet on a queue for transmission bythe egress port network card.

Figure2: Simple Router Architecture

Figure 2 above illustratesthis process. Black arrows represent datapaths. Red arrows represent a packet arriving, being processed andleaving. The blue arrow represents route table lookup and maintenanceby the processor.

High Performance Router Architecture
The simple router architecture doesn’t scale very well. Every packetfrom every port is written to the same memory, where it is read andwritten by the same processor and read by the egress port. The busconnecting the processor, the memory and the network cards is abottleneck.

In modern high performance routers, the bus is replaced by a switchfabric. A switch fabric is an NxN matrix of connections. It has Ninputs and N outputs and it can be programmed to connect each output toany input. The advantage of a switch fabric is that it has a muchhigher aggregate bandwidth than a bus. Whereas a bus can receive ortransmit one packet at a time, an NxN switch fabric can transfer up toN packets from ingress to egress simultaneously.

But for this to work, the central processor must be taken out ofpacket forwarding. Route lookup must be done at the ingress port oneach network card. Each network card must have a processor and areplica of the route table. (The port hardware may use a combinationof hardware and software to do packet processing. For simplicity, wewill just consider software processing ).

The route table protocol runs on the central processor anddistributes updates to the replicas. This is illustrated in Figure 3,below . An arriving packet (red arrows) arrives at an ingressport.Route table lookup is done on the network card and the packet is sentvia the switch fabric to a chosen egress port. The central processormaintains a master route table and notifies changes to the network cardprocessors, so that each has an up to date replica of the master routetable.

Figure3: High performance Router Architecture

Software Complications
The high performance router is a much more challenging environment forthe software architect. In the simple router, all the software ran on asingle processor, one thread at a time, and all the system state wasdirectly accessible in the same memory.

In the high performance router, most packets aren’t seen by thecentral processor. There are as many threads running at a time as thereare processors and a thread on a given processor may have incomplete orout-of-date information about the state of the system as a whole.Different processors have different, potentially contradictory, viewsof the system state.

Because the route table has to be replicated at each port, we mustarrange for routing protocol packets and other local traffic to betransferred between ports and the central processor. We need to worryabout keeping the replicated route tables consistent and about theordering of route table updates.

For example, when a route table entry is added, the processor on theegress port for that route must be told before the other ports, or theother ports may send it packets that it doesn’t know what to do with.Other router functionality, such as access control and statisticsgathering, are also more complex in the high performance router.

A Distributed System
Fortunately, the problem of performing computation on multiplecollaborating processors has been studied for many years. Coulouris,Dollimore and Kindberg [1] define a distributed system as “one in whichcomponents located at networked computers communicate and coordinatetheir action only by passing messages”. This description certainlyapplies to our High Performance Router.

In [3] pp 5-7 in the references at the end of this article,Mullender says that the possibility of failures in communication andprocessors is important. This may not be an issue in systems whereprocessors share a bus and a power supply or even a chip, but itsometimes is an issue. Fault tolerant routers are built with redundanthardware that can be replaced while the system is running to providehigh availability.

Continuing service in the event of processor failure affects manyaspects of application design. So there is a lot to be gained bythinking of our High Performance Router as a distributed system.Treating the router as a distributed system, we can coordinate routetable updates using messages between processors.

Variations on message passing
Message passing seems at first to be a straightforward proposition. Aclient constructs a message and sends it to a server.

(The terms client and server are sometimes used to mean slightlydifferent things. Here, we refer to client and server in the context ofpassing an individual message and getting an individual response. Aclient sends a message to a server and the server responds to theclient. The roles of client and server are transient rather than fixed.It would not be unusual, for example, for a server to send a message toanother entity, becoming a client in turn, or to send a message back tothe client, in which case roles are temporarily reversed. )

The server reads the message and takes some action that depends onthe content of the message. Part of the action may be to send aresponse back to the client. Physically, the message may be passedthough a variety of communication mechanisms, including shared memory,LANs, the Internet and RS232 ports.

But passing messages between processors may not be as simple as itappears. Does the sender wait for a response or run ahead? If it runsahead, how can it collect a response? If it doesn’t run ahead, whatdoes it do while it’s waiting? What guarantees are given about theordering of messages? Without a response, how does the client know whenmessages are processed (as opposed to received)? Are messages evenguaranteed to be delivered?

Weak and Strong MessagingGuarantees. Let us consider doing route table maintenanceusing two different message passing models and explore theconsequences. The differences are extreme for the purposes ofexposition and real world message passing services fall somewherebetween these two extremes and make different tradeoffs betweenflexibility and ease of use.

In both cases, the client constructs a message in a buffer and callsa ‘send’ function with the buffer as a parameter. The server calls a‘receive’ function which blocks until an incoming message is available.The ‘receive’ function then returns a buffer containing a copy of themessage sent by the client. For route table maintenance, the messagepassed consists of an ‘add’ or a ‘delete’ command with appropriateparameters. The models are different, though, in the semantics of theservice they provide.

Weak Guarantee: Asynchronous,Unreliable, Unordered. Thefirst model provides weak guarantees and is rather like UDP/IP, butweaker. The send function returns as soon as possible, before themessage is delivered. It minimizes copying by accessing the bufferafter the send function returns. It allows the client thread to ‘fireand forget’ multiple messages, maximizing throughput if messagedelivery is slow.

On the server side, it delivers a sequence of messages forprocessing. Client and server are decoupled and both can keep busywithout waiting for the other. But the model does not guaranteedelivery of the message. Nor does it guarantee to deliver messages inthe order that they are sent. In short, it provides asynchronous,unordered and unreliable message delivery. It also returns promptly,but may read the buffer after the send function has returned tocomplete message transmission.

Figure4: Route Table Maintenance with Weak Messaging

To maintain a route table using the weak messaging model, an APIlike Figure 4 above must beimplemented on top of the messaging service. Notethat this API is in a layer above the messaging service, not themessaging service API itself. It constructs and sends messages,receives and parses responses and manages message buffers.

The messages are received by software on the server that parsesmessages, calls functions and constructs and sends responses. Thislayer of software is called middleware and we’ll discuss it in moredetail in Part 2.

This API is much more complex than Figure 1 and it needs someexplanation. The functions send_route_table_add and send_route_table_delete correspond directly to the functions in Figure 1. They constructand send messages using supplied parameters. They have additionalparameters, which_server and context .

The parameter which_server defines the destination of themessage – the processor where the route table is located. The parametercontext is an opaque pointer to a user-defined context for themessage.

This parameter is needed to associate a received response with itscorresponding message. It points to whatever information is needed todeal with the response. The function receive_route_table_response retrievesa response to a message if any is available and returns a pointer tothe response message and the context pointer that was passed with themessage. All three functions return an error code if communicationsfail, but success of the function does not imply successful messagedelivery.

Clearly, maintaining a route table using asynchronous messages ismore complex than calling the C API. Furthermore, the weak messagepassing model leaves some issues unresolved.

First, what happens if a message is lost? Correct operation of ourrouter requires that copies of the route table at the ports match themaster copy in the controller, so we must ensure that lost messages arere-transmitted. But we must distinguish between loss of a message andloss of a response.

If we send a message but get no response, we don’t know if themessage was processed or not. Re-sending the message may result in theoperation being done twice or more. In the cases of route tableaddition and deletion, repeating the operation is harmless, but mostfunctions don’t produce correct results if they are called multipletimes. Reliable message delivery eliminates this issue.

Second, what if a message is delivered out of sequence? If twomessages are sent, the first to delete an entry with a given key andthe second to add an entry with the same key, the affected route tableshould finish up with the new entry. If the messages are delivered andprocessed out of order, the entry finishes up deleted instead. So wemust add sequence numbers to the messages and ensure they are processedin order.

A weak message passing model has hidden costs that are easy tooverlook but hard to avoid. These costs are not obvious during softwaredevelopment and testing because failure conditions, processing delays,message loss and out of order delivery generally happen only when asystem is stressed. It is necessary to test the application using adeliberately badly behaved version of the messaging service that drops,shuffles and stalls messages at high rates to make sure that theapplication won’t fail in the field.

Strong Guarantee: Synchronous,Reliable, Ordered. Now consider a message passing service withstronger guarantees. A synchronous service sends messages reliably inorder. The client side ‘call’ function blocks until a response isreceived and it returns the response to another client-supplied buffer.

The server side ‘receive’ function provides a buffer for the serverto place its response. So the client is blocked until the message isprocessed and a response received. After ‘call’ returns, the clientknows that the message has been delivered, processed and answered orthat an error has occurred.

Figure5: Route Table Maintenance with Strong Messaging

Figure 5 above shows aroute table maintenance API using such astrong messaging service. As in Figure 4, we have server parameters andthe functions return error codes that indicate communication problems.The context pointer, however, is not needed. Nor is the separatefunction to receive a response.

Because messages to add and delete routes are processed in order,the state of the remote route table is known. If an add or a deletefails, the failure can be handled without maintaining context in aspecially allocated data structure and without the complication ofmultiple outstanding messages.

The price for this simplicity is that the client is blocked untilthe operation is complete and a distributed system where processorsspend a lot of their time idle is not very useful. For the clientprocessor to get useful work done while waiting for a response, we needa multi-threaded operating system.

For most applications, the cost of reliable ordered message deliveryand a multi-threaded operating system is well paid for in applicationsimplicity and correctness. The higher throughput of asynchronousmessaging can be achieved using multiple threads.

Messaging with weak guarantees is often favored for performancereasons, but is only practical for simple APIs. We need to make surethat like is being compared with like in performance comparisons. Ifthe application is forced to ‘strengthen’ the messaging service andmaintain message context, the result may be more expensive, lessflexible and poorer in performance than adopting a well designed strongmessaging service to start with.

Because a strong message passing model is easier to use and saferthan a weak one, synchronous, ordered, reliable message passing with aresponse indicating completion of processing will be assumed in theremainder of these articles.

In Part 3, we will see that messaging models with strong guaranteesprovided by hardware are common in embedded systems, but first we willdiscuss the abstractions we can build on top of message passing andwhat they do to make distributed processing easier. That is the subjectof Part 2.

A PDF version of Part 1: Varieties of message passing is alsoavailable.

DominicHerity is CTO of Redplain Technology ( to founding Redplain, he served as Technology Leader with Silicon& Software Systems and Task Group Chair in the Network ProcessingForum. He has lectured on Embedded Systems at Trinity College Dublin,where he contributed to research into Distributed Operating Systems andHigh Availability. He has many recent publications on various aspectsof Embedded Systems design.

2. “Distributed Systems Concepts andDesign”, Coulouris, Dollimore and Kindberg, Addison Wesley, 2001, ThirdEdition, ISBN 0-201-61918-0
3. “Distributed Systems”, SapeMullender (Editor), Addison-Wesley, 1989, ISBN 0-201-41660-3
4. “Distributed Operating Systems”,Andrew S. Tannenbaum, Prentice Hall, 1995, ISBN 0-13-143934-0

For more articles on these topics, go to Moreabout multicores, multiprocessing and tools. Other resourcesand information can be obtained at the Multicore Association Website .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.