Efficient coding could have two meanings. First efficient coding couldrefer to making the best use of a developer's time when coding anddebugging and embedded application. Second efficient coding could referto developing code that has high performance using the minimum ofprocessor and memory resources when executing. In this article I willshow how using C++ classes to implement a communications protocol canimprove the developer's efficiency and discuss some techniques forimproving the execution efficiency.
C++ and Development Efficiency
The C++ object model helps speed development in two ways. First throughabstraction or information hiding, C++ allows us to divide and conquercomplex applications like communications protocols. Information hidingis one of the most powerful tools we have for dealing with complexity.The C++ object model allows a developer to explicitly define whatinformation is to be hidden through the use of private data. The C++object model is far superior to C.s module concept which is onlyenforced by programming conventions and not enforced by the languagerules. C modules only work well if everyone follows the rules.
Second, we can use C++ classes to closely model the concepts definedin a communications protocol.s specification. In the following sectionswe will see how a direct one-to-one mapping between the protocolspecification and C++ classes and methods can be achieved. By using thesame concepts and terminology in the software as is used in theprotocol specification, the code becomes easier to implement,understand and maintain.
Protocol Layer Model
Communications protocol specifications usually contain three primarycomponents: the message encapsulation or envelope, the messagesemantics and the protocol state. Figure1 , below shows a class diagram for a protocol layer.
The receiver is responsible for processing the incoming octetstream. It validates the message and removes the encapsulation. Thereceiver also acts as a de-multiplexer, passing messages to the upperlayers or translating the messages into events that are processed bythe state machine.
The state machine processes protocol events which may be specificmessages from the lower layers or other types of events from the upperor lower protocol layers. The transmitter implements actions requestedby the state machine by translating them into protocol messages. Thetransmitter also encapsulates messages from the upper protocol layersand passes the encapsulated messages to the lower protocol layers.
Implementing the Point-to-PointProtocol
Now we will look at a specific protocol, the Point-to-Point Protocol(PPP) as defined in RFC- 1661. The Point-to-Point protocol is actuallya family of protocols sharing the same encapsulation. A PPPimplementation must include the Link Control Protocol (LCP) and one ormore Network Control Protocols, one for each network protocolsupported. The most common Network Control Protocol is the IP ControlProtocol or IPCP. A PPP implementation may also include one or moreauthorization protocols.
So the actual structure of the PPP layer is a bit more complex thanthe idealized structure shown above. Figure2 below shows the structure for the PPP protocol. It include ofa single receiver and transmitter that implement the messageencapsulation and message semantics. The LCP, each NCP and eachauthorization protocol include a context and their own state.
PPP LCP Details
Implementing the complete family of protocols that make up PPP isbeyond the scope of this paper so we will concentrate on the LCP forour example. The following Table 1 shows a list of events and actions defined in RFC-1661:
RFC-1661 also defines ten states for the PPP LCP; 0 – Initial, 1 -Starting, 2 – Closed, 3 – Stopped, 4 – Closing, 5 – Stopping, 6 -Request-Sent, 7 – Ack-Received, 8 – Ack-Sent, and 9 – Opened. With theevents, actions and states defined, RFC-1661 goes on to define theprotocol's behavior shown in the state table. Below, Table 2's rows represent theevents and the columns represent the states. The actions and next stateare show for each event in each state. The empty cells are invalidevents for that state.
The State Pattern
The table shows the complexity of the protocol.s behavior and codingthis state machine as C case statements or if-else statements is adaunting task. However, the protocol state is easily implemented as anumber of simple C++ classes using the State Pattern described inDesign Patterns by Gamma, et al. shown in Figure 3 below.
The Context object in the state pattern contains a pointer to aconcrete State object. Events to be processed by the context aredeferred to a handleEvent() method in the concrete State thus all eventhandling is determined by the context.s current state.
The LcpContext and LcpState BaseClass
The class declaration for the LcpContext is shown below below. Notethat we have declared methods for every event and action defined by theprotocol. The LcpState class is declared as a friend class so that ithas access to the private methods for executing actions and to changethe LcpContext's state.
The implementation of the event methods is simple. The eventprocessing is simply deferred to the State object pointed to by _state.
The action methods are also simple, deferring the processing to atransmitter or timer object. This decouples the State objects from thePppTransmitter, PppReceiver, and LcpTimer classes so that the stateclasses can be easily used for a number of PPP applications such as PPPover Ethernet (PPPOE) or Packet Over SONET (POS):
The declaration of the base class LcpState is shown below. Again, wehave declared methods for each of the events listed in the PPP LCPspecification in RFC-1661.
Each concrete state class will override the default event methodsdefined by the LcpState base class except when a specific event isinvalid for that state. These cases are shown indicated by blankentries in the state table. In these cases, we simply log the error.
The Concrete LcpState Classes
We now go on to define concrete LcpState classes for each of the tenstates specified in RFC- 161. In each of the concrete LcpState classeswe simply define methods to handle each of the valid events that can bereceived in that state. These methods simply execute the actionsdefined in the PPP LCP State Table. For example, here is thercvConfigAck method for the AckSentLcpState class:
Now we will look at some coding techniques for execution efficiency.Many C programmers are concerned with the overhead associated withusing C++ virtual functions and inheritance so I will address thoseconcerns first. Then I will discuss some design level strategies forefficient coding in C++; minimizing context switching, lockless queues,and avoiding dynamic memory allocation.
Virtual Function Overhead
Polymorphism using virtual methods and inheritance are the hall marksof object oriented programming in C++ so it is important to examine therun time costs of using these language features. The code generated bydifferent compilers can vary; most C++ compilers use the followingstrategy for implementing virtual functions. Let's go back to ourLcpState and AckSentLcpState classes. The compiler must implement a wayto determine which version of the event methods to execute based on thetype of the LcpState object pointer, _state, in the LcpContext. This isusually done by the C++ compiler by adding a table of function pointersto the class. This table is refered to as the virtual table. If weconsider only the first three event methods in out LcpState classes thevirtual table might look like that in Table3, below :
Since the AckSentLcpState class does not implement an up method, theup methods slot in the table is filled in with the default up methoddefined in the LcpState base class.
How is the virtual table found at run time? Each object that hasvirtual functions contains a hidden data member, the virtual pointer,placed in it by the compiler. At run time, _state points to an LcpStateobject and the object.s virtual pointer points to the virtual table.The equivalent C code looks something like this:
So the added cost is the time it takes to resolve two pointers. Theother cost is in the memory required to hold the virtual table. For ourstate machine the table is quite large but there usually not manystates, less than a dozen for most protocols.
Minimize Context Switching
Context switching can incur a huge execution cost. First there is thecost of saving and restoring the current context. The cost is evengreater on a high performance processor where a context switch negatesthe effectiveness of instruction prefetching, caching, and pipelining.
Context switching can be minimized by carefully designing thethreading or tasking for the protocol stack. Traditionally protocolstacks have been designed with separate threads for separate layers.Van Jacobson, a well known network researcher and author of severalRFCs once said: “Layers are a goodway to think about communications protocols but a lousy way toimplement them .”
I would change this statement slightly. Translating a human readableprotocol specification into a programming language that is bothreadable by a computer or a human requires a lot of thinking. We uselanguages like C++ because they allow us to use higher levels ofabstraction that more closely represent the real world concepts we aretrying to implement. Since communications protocols are specified inlayers we should use the power of C++ to model this concept in ourcode. Layers are a good way to code communications protocols but alousy way to execute that code.
Different threads are really only required when there is a speedmismatch between physical processes so that software servicing thoseprocesses must be executed asynchronously. Using TFTP to download filesto a FLASH file system for example might only require three threads,one to process received packets, one to write data to FLASH, and onethread to transmit acknowledgement packets as shown in Figure 5 below.
Another source of context switches is system calls. Every time a systemcall is made, there are two context switches. First when the operatingsystem is invokes, second, when the operating system passes control tothe highest priority run-able task. Minimizing the number of tasks willnot help if every time a message is placed in a queue, the operatingsystem is invoked. This can be avoided by using lockless queues.
A lockless queue is a queue that is written to be thread-safewithout using operating system synchronization functions. This ispossible when there is only one thread writing to the queue and onlyone thread reading from the queue, as shown in Figure 6, below .
The lockless queue consists of an array of object pointers, a headpointer and a tail pointer. The key to the lockless queue is that thewriting thread is only allowed to modify the head pointer and thereading thread is only allowed to modify the tail pointer.
Here below is the constructor. The queue itself is an array ofpointers to unsigned char i.e. octets or bytes. The head and tail areindexes into this array. I want the queue size to be a power of 2 sothat I can do some optimization when the array indexes need to rollover from the end of the array back to zero.
Next is the push method which pushes a new message buffer into thequeue. We make a local copy of the _head index to work with. Note thatwhile push reads the tail index, it only modifies the head index. Inthe worst case another thread is doing a pop operation while push isbeing executed. If push reads a stale version of tail the queue mightappear full when it has one entry left, so the buffer would bediscarded, as shown below:
Note the head index is masked with the _mask value that wascalculated in the constructor from the size. This does the requiredroll over to zero at the end of the array with about the same overheadas a compare to queue size that would be required otherwise. The popmethod is shown next. The pop method reads the head index but onlymodifies tail.
The worst case in the pop method is if a stale _head value is read.The queue would appear empty when there is one message in the queue.The thread calling the pop method would simply miss an opportunity toprocess the message. When the queue is almost empty this should notpose a performance problem.
Note that this function requires that the steps be done in the orderthey are coded. It may be necessary to turn off optimization for thelockless queue class though I haven.t experienced any problems withthis implementation using optimization with several compilers ondifferent processors.
There are more complex lockless data structures that can be appliedto inter-process communication or dynamic memory allocation. These datastructures are implemented using special atomic processor instructionsso they must be partially coded in assembly language.
Avoid Dynamically Allocating andFreeing Memory
C++ is notorious for unexpectedly creating copies of objects but thisreputation is not deserved.
Most of the time you want to pass pointers to objects or referencesto objects but if you are not careful about this C++ will maketemporary copies. This can happen when you pass an object as aparameter to a function or return an object from a function. You canprevent accidentally passing objects around when you want to passpointers or references by making the copy constructor and the =operator private, as shown below.
If you define a class this way and try to pass a copy of an objectinstead of a pointer, the compiler will complain about it.
Alternative new() and delete() Some types of objects such as messagebuffers are created and deleted over-and-over in networking software.The default implementation for new() and delete() is to use malloc()and free() from the standard C library for managing memory. In caseswhere you are using the same types of objects over-and-over anoptimized memory manager can be used for these specific objects. new()and delete() are simply operators that you can over-ride like any otheroperator in C++.
There are several alternatives for obtaining and managing memory foryour more dynamic objects. A pool of objects can be defined staticallyif the system.s resource needs are well understood. malloc() can beused to create a large block of memory for storing a number of objectsat a time so that malloc() and free() aren.t called as often. Anotheralternative is to cache objects once the have been allocated usingmalloc().
The objects are never really freed but instead are saved in a poolfor reuse. When the pool becomes empty, additional objects are createdin the usual way using malloc(). A variation on this approach is thecache a specific number of objects, freeing addition objects if thepool is full and allocating new ones when the pool becomes empty.
Using object oriented programming in C++ can improve the developer'sefficiency in implementing communications protocols. We have seen oneexample of how using the State pattern simplifies developing the codefor the PPP Link Control Protocol. With careful use, C++ does not addintolerable overhead to executing communications protocol software.Several strategies were presented for developing efficientcommunications protocol software at a system level demonstrating thatC++ can be used to develop high performance networking software.
|This article is excerpted from a paper ofthe same name presented atthe Embedded Systems Conference Boston 2006. Used with permission ofthe Embedded Systems Conference. For more information, please visit www.embedded.com/esc/boston/|
Harvey Sugar is a senior softwareengineer at Mantaro Networks, Inc. He hasover 25 years of experience in systems and software engineering,developing real-time embedded systems primarily for telecommunicationsand data network products.
1) Simpson, W. ThePoint-to-Point Protocol(PPP) RFC-1661. Daydreamer Computer SystemsConsulting Services, July 1994.
2) Gamma, E. Helm, R.Johnson, R. and Vlissides, J. Design Patterns; Elements of ReusableObject-Oriented Software. Addison-Wesley Publishing Company, Reading,MA 1994.
3) Lippman, S. Inside theC++ Object Model. Addison-Wesley Publishing Company, Reading, MA 1996
4) Meyers, S. Effective C++;55 Specific Ways to Improve Your Programs and Designs, Third Edition. .Addison-Wesley Publishing Company, Reading, MA 2005.
5) Meyers, S. More EffectiveC++; 35New Ways to Improve Your Programs and Designs, Third Edition. .Addison-Wesley Publishing Company, Reading, MA 1996.
6) Jacobson, V. andFelderman, B. Speeding Up Networking. Linux.conf.au 2006, Dunedin, NZ
7) Dokumentov, A. Lock-freeInterprocess Communication. Dr. Dobbs Portal, June 15, 2006.