In thefirst part in this series of articles the application ofdistributed processing techniques and methods to asymmetricmultiprocessors, Idescribed an example software feature (IP forwarding table maintenance)and the consequences of message passing semantics. The secondpart discussed the value of higher levels of abstraction brought by RPC anddistributed objects.
In this part, we will focus on the differences between multicoresand traditional distributed systems. We will discuss the consequencesof these differences and how to make the best use of tried and testeddistributed processing techniques while avoiding pitfalls arising fromtheir misapplication.
A distributed system on a chip
Consider building a router like the one described in Part 1 as a systemon chip (SoC). The chip has a switch fabric with eight ports. Each portis serviced by a 300MHZ RISC processor. The chip also contains a 300MHZRISC control processor. Each processor has a separate memory and eachport processor communicates with the control processor on a separatefour lane PCI Express link. The PCI Express link has 10Gb/s1 bandwidthand 160ns latency. This is typical of modern chip and boardinterconnects and has similar properties to Hypertransport, Infinibandand RapidIO.
We will contrast this with a more traditional distributed systemwith a variable number of 3GHz RISC processors on Gigabit Ethernet.What aspects of these environments are different? How can we make thebest use of traditional embedded system tools and techniques withoutbeing tripped up by these dis-similarities?
Properties that differ in the two environments include latency,bandwidth, processor speed, reliability of physical communications,numbers of processors and startup and shutdown scenarios. We willexplore properties that are different in order of importance and seewhat the consequences are for software design.
Latency. The limitingfactor for most network applications such as web browsing, filetransfer and video, where the volume of data transferred is large isbandwidth. But distributed processing is unique in its sensitivity tolatency. No other use of a network is affected by sub-millisecondlatency. Distributed processing involves sending short messages andwaiting for short responses. Messages can be 10 bytes or less. Sodistributed application performance tends to be limited by responsetime rather than bandwidth and response time is limited by latency.
Minimum sized Gigabit Ethernet frames can take under 1 microsecondto transmit, but LAN equipment, which is usually build for bandwidthrather than latency, can take tens of microseconds to deliver it. If weare working across the Internet, we may have thousands of miles ofpropagation delay, no matter how much bandwidth we have. Because wecannot transmit messages faster than light, this means we may have aone way latency of 100ms or more.
Chip and board level interconnects such as the one in our exampledeliver small messages in times of the order of 100ns, which is between100 and 1,000,000 times less than is usual for distributed systems.
Bandwidth. Gigabit Ethernethas less bandwidth than is typically available on a chip. It is 10times as slow as our PCI Express example. So our distributed system ona chip may have a lot more bandwidth than a traditional distributedsystem.
Speed to Latency Ratio. Akey metric when evaluating software overhead is the ratio of processorspeed to latency. The processor in a multicore SoC may be 10 timesslower than that in a traditional distributed system. This differencemagnifies the differences in latency and bandwidth which run the otherway.
If we compare 300MHz processors on four lane PCI Express with 3GHzprocessors on Gigabit Ethernet, the former can execute about 50instructions (160ns at 300MHz) in the time it takes to deliver amessage, while the latter can execute 30,000 (10us at 3GHz).
This has major consequences for message passing software overhead.An overhead that is irrelevant in a traditional distributed systemwould be a major limitation on a chip. Software overhead must beminimized for on-chip message passing to an extent not seen intraditional distributed processing.
Reliable Physical Layer. Networks do not typically deliver messages reliably. Packets areroutinely lost not only because of transmission errors, but alsobecause they are discarded in a congested network. Reliability must beprovided by a transport layer like TCP/IP, which re-transmits packetsuntil they are acknowledged.
The possible need for re-transmission results in an unboundedmessage delivery time, which is unacceptable in a real timeenvironment. Real time systems that use unreliable message deliveryneed specialized algorithms to accommodate message loss.
On-chip interconnects have flow control and they deliver messagesreliably as well as quickly. The availability of a reliable physicallayer can make distributed processing much easier. It ensures realtime, ordered message delivery.
But the presumption of unreliable messaging runs deep in distributedsystems software. Systems are deployed that run TCP/IP where messageloss doesn’t occur. Protocols add checksums and sequence numbers tomessages whether they are needed or not, adding to software overhead.
Number of Processors, ConcurrentStartup and Shutdown
In traditional distributed systems, the set of processors in the systemmay vary with time. The application may continue to run whileprocessors come and go.
Processors may fail or become temporarily unreachable. Processorsmay go down and come back up with a different software version. Thenumber of processors involved may be unbounded and software designersstrive for scalability in number of processors. All these issuescomplicate traditional distributed processing.
By contrast, the number of processors on a chip is known in advance.They power up and down simultaneously and they don’t becomeunreachable. Software for all processors can be loaded from a singlesource and can be ensured to be consistent. Protocols for dynamicallyadding processors to an application, for removing them, forauthenticating them and for dealing with multiple versions of softwareon peer processors are not needed.
Table 1 summarizes the differences between a traditional distributedsystem and a distributed system on a chip.
|Table1: Differences on a Chip|
Perhaps the most important difference is the impact of softwareoverhead on message passing performance. Software overhead is a muchgreater problem on a chip than in a traditional distributed system. Butwe can exploit reliable real time messaging, known processor resourcesand consistent software versions to help reduce software overhead.
Distributed systems research and products have much to offer theimplementer of a distributed system on a chip. There is a wealth ofknowledge to draw on and there are many wheels that don’t need to bereinvented. But we need to take care to choose the right wheels.Inappropriate application of technology designed for traditionaldistributed systems will give poor results, but good use of distributedsystems technology and methods can make the transition to working withmultiple processors much easier and more productive.
I would welcome any comments on these articles, good and bad. Pleasemail me at firstname.lastname@example.org.
Thanks to Aidan Kenny, Paul Mannion, Philip O’Carroll, John Roden, ZacSchroff and Sebastian Tyrrell, who read early drafts of this series.Their comments, suggestions and points of information lead tosubstantial improvements. The mistakes, however, are all my own work.
For PDF version of Part 3 in this series, go to “Uniqueproperties of embedded multiprocessors.”
Dominic Herityis CTO of RedplainTechnology .Prior to foundingRedplain, he served as Technology Leader with Silicon & SoftwareSystems and Task Group Chair in the Network Processing Forum. He haslectured on Embedded Systems at Trinity College Dublin, where hecontributed to research into Distributed Operating Systems and HighAvailability. He has many recent publications on various aspects ofEmbedded Systems design.