A core design pattern for building scalable distributed real-time systems
Editor’s Note: Rajive Joshi of RTI describes how to use the OMG Data Distribution Service (DDS) standard as a real-time systems data bus and apply it to data-centric M2M and IoT distributed real-time designs.
Data-centric design is a powerful methodology for building interoperable, scalable, modular distributed systems and for integrating independently developed sub-systems. It decouples the application logic from the management of stateful data; application components both change and respond to changes in explicitly defined data. The data models are discoverable and observable, and a “data bus” takes on the responsibility of distributing data changes amongst the components.
The OMG Data Distribution Service (DDS) standard for real-time systems provides such a data bus. In this article we will show how to apply data-centric design to build both synchronous and asynchronous invocable services using the DDS standard. The approach is suitable for use in distributed real-time systems. It works well across disconnected and intermittent links and is unique in offering fully discoverable and observable distributed service interactions.
An invocable service is an abstraction for a collection of functionality expressed as a set of operations. Each operation represents a request-response interaction between two parties: one sends a request to the other, which sends a response back. The party that sends the requests is typically referred to as the service’s client.
Traditional implementations [1-7] of invocable services are tightly coupled; the interaction is dependent on the locations of the server and client. Even if a naming service  is used, the service is bound to a location after being discovered. Past approaches [1-7] also assume a reliable unicast network transport such as TCP, and rely on a point-to-point connection-oriented session being established between the clients and the servers. These choices are reasonable in a tightly managed data-center environment where the network links are stable and plentiful network and CPU resources are available.
However, real-time systems on the operational or tactical edge do not have the luxury of making such assumptions. Network links may be intermittent and may get disconnected during normal course of operation; platforms are often resource constrained; and the interactions are time-critical. Such systems span a wide-spectrum of applications, including unmanned vehicles, process automation, SCADA, medical devices, switching equipment, transportation, smart grid, and so on. Thus, traditional implementations of invocable services are unsuitable in such environments.
The Object Management Group’s Data Distribution Service (DDS) [8-10] is the first open international middleware standard directly addressing publish-subscribe communications for real-time and embedded systems, suitable for such environments. The DDS programming model is inherently asynchronous, data-centric, and peer-to-peer. It can connect disparate application components across a wide geography and a variety of networks. The resulting system architecture is loosely coupled around real-time data; new components can be easily added or removed without disrupting existing components. However, DDS does not provide a notion of invocable services.
In this paper we introduce a data-centric [11-14] design pattern for building invocable services using DDS, while preserving the logical service-centric remote invocation model and also maintaining the underlying DDS programming model.
The design pattern introduced in this paper is suitable for use in real-time systems on the operational or tactical edge, where:
- The request-response interactions are of interest to parties not actively participating in the interactions
- The interactions are time critical, and real-time quality of service (e.g. deadlines) is desired
- Network links may get disconnected or are intermittent
- Applications may be resource-constrained and can only implement a subset of the full service interface
- A uniform way to cancel long-running requests is desired, after a client has lost interest
- Interactions need to survive component and/or link failures
- Service instances may be added/removed in response to workload/demand
The design pattern can be contrasted with CORBA synchronous method invocation [1-3], which requires reliable connections (TCP semantics) and a broker to mediate between the clients and the servers. Also, the interactions are not directly observable by participants outside the interaction. The proposed pattern implements remote invocation over DDS, and thus can work over connectionless links (UDP semantics) without the need for a centralized broker. Since the pattern models the interactions as regular DDS Topics, they can be discovered and observed by any participant in the system. Thus, an entire interaction sequence can be reconstructed, with full traceability of the dynamic and real-time service behavior. In the past, such observability has been difficult to achieve using just the constructs provided by middleware. Furthermore, DDS implementations offer higher performance .
Moreover, implementing data-centric invocable services on top of DDS provides several additional benefits:
- Location independence Clients and service instances can be instantiated anywhere in any order. The state is decoupled from the applications, and is available wherever the applications are instantiated; thus application migration and mobility are supported naturally.
- Elasticity As the client workload increases or decreases, additional service instances can be transparently added or removed to handle the change in demand, through the use of application specific logic. Since a service interface can be split across multiple hosts - a single “service instance” does not have to execute the entire set of operations. This can result in much finer-grained elastic resource scaling for just the most frequently used operations.
- Scalability As more service instances are added, proportionally more clients can be served. Since the client and service roles are loosely coupled around the data, each dimension can be scaled independently. Thus, any number of clients and/or server instances can be added independently of one another (horizontal scaling).
- Network independence Applications are able to communicate reliably across a variety of network links, including those that may be disconnected or intermittent. Also, multicast, when available, can be used to distribute requests across a pool of server instances.
- Request cancellation is explicitly modeled Since requests are explicitly modeled, cancelling requests is an explicit and normal state of the request lifecycle. Thus, an application can explicitly cancel a request after a client has lost interest, simply by disposing the request data-object.
- Application Quality of Service (QoS) contract enforcement DDS supports a rich set of application QoS that describe contracts on the delivery of data. Those can be applied to client-service interactions. For example, an application can place expiration on requests (or responses) to purge stale data or set a deadline on a service response time to support real-time operations. Developers can also prioritize certain service operations over others.
- Efficient network utilization DDS provides a high-performance (high throughput, low latency, low CPU load) data bus that relies of efficient binary data representation on the wire, instead of textual encoding in XML, while still retaining the transparency of interactions.
- Auditability Since the client-server interactions are explicitly modeled, all service interactions can be monitored by subscribing to the appropriate DDS Topics, logged, recorded, and played back.
- Fault-Tolerant Since the data is decoupled from the applications, the implementation may be decentralized to eliminate single points of failure. A pool of redundant server instances may provide automatic failover, and achieve the desired level of service availability.
Next, we describe how to implement data-centric invocable services on top of DDS , suitable for use in real-time systems.
Design pattern overview
Figure 1 shows a Service Interface and the associated Client and Service roles. In general, there may be many instances of the same service. A Client uses a service by invoking operations on the Service Interface. In general, a Client may use one or more services, and multiple Clients may use a given service.
The Client and Service labels in Figure 1 represent logical roles that may be implemented by one or more physical instances for load balancing, mobility, and redundancy purposes.
Applying the principles of data-centric design, we model the service interaction as two pieces of changing state: a Service Request, which expresses the desire of the Client to obtain a certain piece of information, and a Service Response, which includes the desired information, most likely correlated with the request.
The Service Request contains all of the inputs to an operation, while the Service Response contains all of the outputs, as shown in Figure 1. Each request and corresponding response has a distinct identity and lifecycle.
When implementing the pattern with DDS, the request and response each corresponds to a topic. A service invocation results in a Client instance publishing a request on the ServiceRequestTopic and waiting for the responses.
A Service instance subscribes to the request topic, expressing interest in only the requests that it will process (using a content-filtered topic), and publishes the responses on the ServiceResponseTopic. The Client instance likewise subscribes to the response topic, expressing interest only in the responses to the requests it issued. The interaction diagram is shown in Figure 2.
The request and response topics decouple the state of the “service” (i.e. the client-service interactions) from the application logic of the clients and the servers, and model it as a first class citizen. The state is location independent, and not bound to a specific client or service instance. The state is meaningful on its own, and can be discovered, persisted, recorded, replayed, mined, audited, manipulated independently of the client and server application instances.
Design pattern how-to
In this section we will develop an abstract data model and map it to an implementation using DDS constructs.
Data model Figure 3 shows the data model behind the request and response topics. The ServiceRequestType and the ServiceResponseType define the structure of the request and response topics respectively. A request or response data-object is identified by InteractionType, which specifies three things: the logical name of the service, the logical name of the client and the operation instance. The EntityIdType uses a uniform resource identifier (URI) scheme to identify the logical service and client. The OperationIdType specifies the interface name, operation name and an invocation identifier to uniquely identify a specific operation instance.
The request and response types include a sequence of ParameterIdType objects. The ParameterIdType is a polymorphic type to carry the payload associated with input and output parameters for an operation. The response type also includes a status field defined by the ResponseStatusKindEnum that can be in one of four states: PROCESSING, CANCELLED, DONE_SUCCESS or DONE_FAILURE. Output parameters on a response are valid if and only if the status indicates a successful completion.
Component model Figure 4 shows the component model for client and service role instances. The GenericClient and GenericServer classes work on the data model (Figure 3) using the respective state machines (Figures 5 and 6) to implement the client-server interaction (Figure 2). The ClientApp uses a strongly typed facade that implements the service interface and delegates the actual work of distributing the requests to the GenericClient class. A strongly typed facade around the GenericService class delegates the actual work of the service implementation to the ServiceApp class. The GenericClient and GenericService
classes use DDS to efficiently distribute requests and responses with real-time QoS.
To use this design pattern, a developer simply writes the application-specific ClientApp and ServiceApp code and applies the boilerplate facades to connect them to the generic client and server classes. The creation of facades can (should) be automated using a code-generator.
Behavioral model Figure 2 (above) shows the data-centric interaction diagram. The client and server instances operate on the shared state defined by the data model, so they do not need to know each other’s location. Therefore, service (server) instances can be added anywhere, independently of the clients, for redundancy and/or load balancing. Redundant servers can synchronize with the state of the primary server simply by subscribing to the ServiceResponseTopic.
Client instances can be mobile, across disconnected and intermittent links, and may migrate from one host to another.
As a result of this flexibility, horizontal scaling is automatically built in. Since the interaction state is available on the DDS data bus, it can be managed independently outside of the client and server instances,. Thus, the interactions are immune to service instance failure by design. These are important requirements for the operational systems and emerging cloud applications .
GenericClient State machine Figure 5 shows the GenericClient state machine implemented using DDS. Since each request is modeled as an explicit data object, it can be cancelled by disposing (DDS operation) the request data object, either explicitly by the user or as a result of a user-specified timeout.
Requests that are completed or cancelled are unregistered to reclaim resources on the DDS data bus.
GenericServer state machine Figure 6 shows the GenericServer state machine implemented using DDS. If a request data object being processed is disposed, processing is aborted and the response is disposed. Responses that are no longer needed are unregistered to reclaim resources on the data bus.
Applications of the data-centric model
The design pattern introduced in this article is a fundamental building block for implementing data-centric invocable services using DDS for real-time systems. A few important applications are highlighted below.
- Splitting service-role across multiple instances The EntityIdType identifies the logical Service role that implements the service interface. The actual implementation of the service interface may be split across a set of server instances, where a server instance may implement only a subset (one) of the operations. A server instance should only subscribe to operations on the ServiceRequestTopic that match the operations it implements, and the operation responses of the ServiceResponseTopic to keep track of the state of redundant servers, if any.
- Service load balancing An operation may be implemented by multiple service instances. In that case, the incoming load could be spread across the service instances by assigning each request to a “bucket”, and configuring a service instance to subscribe only to requests in its bucket. The client load may be split across service instances based on a hash-table on the request contents or some criteria that spreads the requests evenly across the buckets. A content filter or a PARTITION QoS Policy or a topic could be used to assign request to buckets.
- Service operation redundancy An operation can be implemented redundantly by running multiple copies of a service instances (per bucket). For deterministic operations the ServiceResponseTopic can be configured with a mutually exclusive OWNERSHIP (DDS) QoSPolicy, so that only one response will be delivered to the originating client role. The redundant copies may be configured to be active simultaneously or in an active/passive configuration where backup instances become active upon detecting failure of the primary service instance.
- Service state in the cloud The service state required to resume operations after a crash should be decoupled from the server application logic. The server state could be modeled as additional DDS Topic(s), which are durable to ensure availability at startup and for synchronizing with redundant backups. In the example presented, DataStoreServiceApp could be implemented via DDS Topics.
- Client non-blocking operations In certain applications, it may be natural for a client to issue a request and continue working rather than waiting for the response. The response could be processed asynchronously as it arrives or when the client is ready to handle it. In that case, rather than using a blocking WaitSet, the client application would use a Listener in the GenericClient.
- Client mobility On the client side, the EntityIdType identifies the logical client role that uses the service interface. The actual implementation of the client role may occur across a set of client instances. One or more instances may issue requests, and all would get notified of the responses (non-blocking mode). Thus, a client can issue a request from one location and consume the response from another.
We’ve presented the foundation for implementing data-centric invocable services using the DDS standard for real-time systems. The service requests and responses are modeled explicitly as first-class citizens described by a data model. The interaction diagram, generic component model, and state machines are built around the data model. Several advanced variations are realized on top of this foundation through the use of DDS capabilities. The generic design pattern has been demonstrated in a prototype and applied in several operational systems. Example code for a prototype is available.
The data-centric invocable service design pattern is an essential building block for the next generation of distributed system-of-systems. The resulting systems are modular, loosely coupled, location independent, and scalable, while meeting real-time performance requirements. They can be elastically expanded or contracted to respond to demand. The interactions can also be monitored, audited, mined, and run on a variety of networks, including disconnected intermittent links.
Dr. Rajive Joshi is a Principal Solution Architect at RTI and serves as a technology consultant to customers building high performance distributed systems. His technical expertise spans distributed real-time systems, embedded systems, robotics, evolutionary computing, and sensor fusion. He has 18+ years of experience in software architecture, design, and implementation of distributed real-time systems and middleware. He is a co-inventor of seven patents, and has co-authored 30+ publications. He holds a Doctorate and Masters in Computer and Systems Engineering from Rensselaer Polytechnic Institute, and a Bachelor's in Electrical Engineering from the Indian Institute of Technology at Kanpur. He is a member of the IEEE, ACM, and AIAA. He can be reached at Rajive.Joshi@rti.com.
 Victor Fay-Wolfe, Lisa C. DiPippo, Gregory Copper, Russell Johnston, Peter Kortmann, and Bhavani Thuraisingham. 2000. "Real-Time CORBA." IEEE Trans. Parallel Distrib. Syst. 11, 10. October 2000.
 Lee Man Kei and Xiaohua Jia. 2000. "An Efficient RPC Scheme in Mobile CORBA Environment." In Proceedings of the 2000 International Workshop on Parallel Processing (ICPP '00). IEEE Computer Society. 2000.
 Common Object Request Broker Architecture.
 S. Maffeis. 1996. "A fault-tolerant CORBA name server." In Proceedings of the 15th Symposium on Reliable Distributed Systems (SRDS '96). IEEE Computer Society. 1996.
 Campadello, S., Koskimies, O., Raatikainen, K., Helin, H. "Wireless Java RMI." Enterprise Distributed Object Computing Conference, 2000.
 Chung-Kai Chen, Cheng-Wei Chen, Chien-Tan Ko, Jenq-Kuen Lee, and Jyh-Cheng Chen. 2008. "Mobile Java RMI support over heterogeneous wireless networks: A case study." J. Parallel Distrib. Comput. 68, 11. November 2008.
 Java Remote Method Invocation
 Rajive Joshi and Gerardo Pardo-Castellote. "OMG’s Data Distribution Service Standard." Dr. Dobbs, November 20, 2006.
 OMG Data Distribution Portal
 Christian Esposito, Stefano Russo, Dario Di Crescenzo. "Performance assessment of OMG compliant data distribution middleware." In 2008 IEEE International Symposium on Parallel and Distributed Processing. 2008.
 Abdul-Halim Jallad and Tanya, Vladimirova. "Data-Centricity in Wireless Sensor Networks." Guide to Wireless Sensor Networks, Computer Communications and Networks. Springer Verlag. 2009.
 Ivan Marsic. "An architecture for heterogeneous groupware applications." In Proceedings of the 23rd International Conference on Software Engineering (ICSE '01). IEEE Computer Society. 2001.
 T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim, S. Shenker, I. Stoica, "A Data-Oriented (And Beyond) Network Architecture", ACM Sigcomm. 2007
 Rajive Joshi. "Data-Centric Architecture: A Model for the Era of Big Data." Dr. Dobbs. March 26, 2011.
 Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica and Matei Zaharia, "Above the Clouds: A Berkeley View of Cloud Computing", UCB/EECS Technical Report 2009.