Building a effective real-time distributed publish-subscribe framework Part 1

Rajive Joshi, Real-Time Innovations, Inc.

August 09, 2006

Rajive Joshi, Real-Time Innovations, Inc.August 09, 2006

Data-centric design is emerging as a key tenet for building advanced data-critical distributed systems that link diverse control oriented embedded devices and sensors to data processing systems within large enterprises.

For example, in manufacturing, the production equipment is quickly becoming all electronic and is being networked together; as a result the data from the sensors and controllers on the factory floor are being linked to the data collection and Enterprise Resource Planning (ERP) systems in the enterprise, to create a more nimble and responsive manufacturing organization. Another example is the auto service shop that may send the data from the auto sensors to the auto manufacturer's main data collection centers for analysis, diagnostics, and feedback.

Yet another example is the military's "Global Information Grid," linking the diverse sensors and local decision making nodes deployed in the field with command and control centers to empower the soldier in the battelfield. 

Two emerging middleware API standards emerging to accomplish this task are the Data Distribution Service (DDS) and the Java Messaging Service (JMS), because they are easy to use, and offer the benefits of using a publish-subscribe communication model resulting in loosely coupled scalable distributed applications. However, their differences have significant impact on a data-centric design.

DDS and JMS are based on fundamentally different paradigms with respect to data modeling, dataflow routing, discovery, and data typing; yet they offer a similar and easy to use experience to the application programmer. They differ significantly in their support for data filtering and transformation, connectivity monitoring, redundancy and replication, and delivery effort. Each also offers some distinct capabilities; and they both offer some equivalent capabilities.

When evaluating these alternatives in a distributed control environment it is important to understand the practical considerations and differences in using the two standards with respect to middleware architecture, platform support, interoperability, transports, security, administration, performance, scalability, real-time application specific support, and enterprise application specific support.

DDS and JMS APIs may be used together in an application. The can leverage each other via JMS-DDS bridging, JMS/DDS bindings, or by using DDS for JMS discovery. We discuss these approaches and their suitability for different data-centric integration scenarios.

What is data-centric design?
As result of the growing popularity of cheap and widespread embedded data collection “edge” devices, the easy availability of high performance messaging and database technology, and the increasing adoption of SOA and Web Services, data-centric design is emerging as a key tenet for building advanced data-critical embedded systems.

As computation and storage costs continue to drop faster than network costs, the trend is to move data and computation locally, which means that choosing the right data distribution method for moving data between the nodes as and when needed, is becoming critical in many distributed embedded systems.

Data-centric design is key to systems which exhibit some or all of the following five characteristics: (a) participants are distributed; (b) interactions between participants are data-centric and not object-centric; often these can be viewed as “dataflows” that may carry information about identifiable data-objects; (c) data is critical because of large volumes, or predictable delivery requirements, or the dynamic nature of the entities; (d) computation is time sensitive and may be critically dependent on the predictable delivery of data, (e) storage is local. Examples of data-centric systems are found in traffic control, command and control, networking equipment, industrial automation, robotics, simulation, medical, supply chain, and financial processing.

Several middleware technologies and standards have been applied to construction of distributed systems including DDS and JMS, Enterprise Java Beans (EJB) as well as High Level Architecture (HLA), CORBA, CORBA Notification Service. These middleware technologies fit the requirements of data-centric distributed systems to varying degrees. Specific requirements demanded by data-centric distributed systems include (1) ability to specify structured data models; (2) ability to dynamically specify and (re)configure the data flows; (3) ability to describe delivery requirements per data flow; (4) ability to specify and control middleware resources such as queues and buffering; (5) resiliency to individual node or participant failures; and (6) performance and scalability with respect to number of nodes, participants, and data flows.

The Publish-Subscribe Communication Model. Distributed data-centric application architectures often map naturally to a publish-subscribe (P-S) communication model. A P-S communication model (Figure 1, below), uses asynchronous message passing between concurrently operating subsystems. The publish-subscribe model connects anonymous information producers with information consumers. The overall distributed system is composed of processes, each running in a separate address space possibly on different computers. We will call each of these processes a “participant application.” A participant may be a producer or consumer of data, or both.

Figure 1 Publish-subscribe middleware decouples information producers from consumers.

Data producers declare the topics on which they intend to publish data; data consumers subscribe to the topics of interest. When a data producer publishes some data on a topic, all the consumers subscribing to that topic receive it. The data producers and consumers remain anonymous, resulting in a loose coupling of sub-systems, which is well suited for data-centric distributed applications.

The P-S communication model enables a robust service based application architecture that decouples participants from one another, provides location transparency, and flexibility to dynamically add or remove participants.

Both DDS and JMS support a P-S communication model and often serve as the integration glue or the “data bus” interconnecting the participants producing or consuming data.

The Data Distribution Service (DDS) is a formal standard from the Object management Group (OMG) popular in embedded systems, especially in industrial automation, aerospace, and defense applications. DDS specifies an API designed for enabling real-time data distribution. It uses a publish-subscribe communication model, and supports both messaging and data-object centric data models.

Java Message Service. JMS is a defacto industry standard popular in the enterprise systems for messaging applications. JMS specifies a Java API for wrapping message-oriented middleware (MOM) APIs, so that portable application (Java) application code may be written. In that respect, it is similar to other Java APIs such as JDBC for abstracting database access, or JNDI for abstracting naming and directory services. JMS uses a publish-subscribe communication model, and a messaging or eventing data model.

Thus, both DDS and JMS provide standardized APIs to preserve application portability across middleware vendors; both use a publish-subscribe (P-S) communication model. Both DDS and JMS APIs are intuitive and easy to use, and their popularity mitigates the risk in utilizing them for new data-centric designs.

DDS versus JMS. The two protocols differ in their ability to cater to the key data-centric design requirements outlined earlier, with respect to (1) data modeling and manipulation, including lifecycle management, data filtering, and transformation; (2) dataflow routing and discovery, including point to point connectivity; (3) delivery quality of service (QoS) per data flow, including delivery effort levels, timing control, ordering control, time-to-live, and message priority; (4) resource specification and management, including resource limits, and history; (5) resiliency to failures, including redundancy and failover, and status notifications; and (6) performance and scalability.

DDS is newer standard based on fundamentally different paradigms than JMS, with regards to data modeling, dataflow routing, discovery, and data typing; these differences enable applications designers with powerful new architectural possibilities. Despite these differences, the user experience of writing to DDS APIs is similar to that of JMS APIs. Also, they both provide support for persistent delivery, and time-to-live for a data item.

Distinctive DDS capabilities include data modeling and lifecycle management, automatic dataflow routing, spontaneous discovery, content based filtering and transformation, per dataflow connectivity monitoring, simple redundancy and replication, delivery ordering, and real-time specific features such as best efforts delivery, predictable delivery, resource management, and status notifications. In addition, DDS offers several enhanced capabilities with respect to data filtering and transformation, connectivity monitoring, redundancy and replication, and delivery effort. DDS offers new capabilities with respect to data-object lifecycle management, predictable delivery, delivery ordering, transport priority, resource management, and status notifications.

JMS offers some capabilities not offered by DDS. Distinctive JMS capabilities include point-to-point delivery to exactly one of many consumers, message priority, and enterprise specific features such as full transactional support, and application level acknowledgements. Unlike DDS, JMS requires administration of the JMS provider (server) and JNDI registries.

Unlike JMS, which is a Java language standard, standard DDS APIs are available in many languages. The API design choices made by DDS can support potentially higher performance (lower latency and higher throughput) and better scalability than JMS. DDS has some capabilities optimized for real-time applications, not found in JMS. JMS has some capabilities optimized for enterprise applications, not found in DDS.

DDS is amenable to a decentralized peer-to-peer architecture, which can be more robust and efficient compared to centralized server based architecture commonly used for JMS.

Neither DDS nor JMS provide an interoperability protocol, although there is one currently under standardization for DDS. Neither specifies a transport model, although there are some capabilities in DDS that are better suited to unreliable transports such as UDP, while JMS can generally benefit from the availability of a reliable transport like TCP. Both DDS and JMS defer security to the application, and only provide support for communicating security credentials.

DDS and JMS merit careful consideration for data-centric design. Using one or both can considerably simplify a data-centric design, and help maintain the focus on application issues, rather than becoming bogged down by communication and data delivery concerns.

The Basics of DDS
DDS targets real-time systems; the API and Quality of Service (QoS) are chosen to balance predictable behavior and implementation efficiency/performance. The DDS specification describes two levels of interfaces:

* A lower level Data-Centric Publish-Subscribe (DCPS) that is targeted towards the efficient delivery of the proper information to the proper recipients.
* An optional higher-level Data-Local Reconstruction Layer (DLRL), which allows for a simpler integration into the application layer.

The DCPS model builds on the idea of a “global data space” of data-objects that any entity can access. Applications that need data from this space declare that they want to subscribe to the data, and applications that want to modify data in the space declare that they want to publish the data. A data-object in the space is uniquely identified by its keys and topic, and each topic must have a specific type. There may be several topics of a given type. A global data space is identified by its domain id, each subscription/publication must belong to the same domain to communicate.

Figure 2, below, illustrates the overall data-centric publish-subscribe model, which consists of the following entities: DomainParticipant, DataWriter, DataReader, Publisher, Subscriber, and Topic.

Figure 2. UML diagram of the DDS data-centric publish-subscribe interfaces

All these classes extend Entity, representing their ability to be configured through QoS policies, be enabled, be notified of events via listener objects, and support conditions that can be waited upon by the application. Each specialization of the Entity base class has a corresponding specialized listener and a set of QoSPolicy values that are suitable to it.

Publisher represents the objects responsible for data issuance. A Publisher may publish data of different data types. A DataWriter is a typed facade to a publisher; participants use DataWriter(s) to communicate the value of and changes to data of a given type. Once new data values have been communicated to the publisher, it is the Publisher’s responsibility to determine when it is appropriate to issue the corresponding message and to actually perform the issuance (the Publisher will do this according to its QoS, or the QoS attached to the corresponding DataWriter, and/or its internal state).

A Subscriber receives published data and makes it available to the participant. A Subscriber may receive and dispatch data of different specified types. To access the received data, the participant must use a typed DataReader attached to the subscriber.

The association of a DataWriter object (representing a publication) with (representing the subscriptions) is done by means of the DataReader objectsTopic. A Topic associates a name (unique in the system), a data type, and QoS related to the data itself. The type definition provides enough information for the service to manipulate the data (for example serialize it into a network-format for transmission). The definition can be done by means of a textual language (e.g. something like “float x; float y;”) or by means of an operational “plugin” that provides the necessary methods.

The DDS middleware handles the actual distribution of data on behalf of a user application. The distribution of the data is controlled by user settable Quality of Service (QoS).

The basics of JMS
JMS targets enterprise messaging; the API is chosen to abstract the programming of a wide variety of message-oriented-middleware (MOM) products in a vendor neutral and portable manner, using the Java programming language.

Figure 3, below, illustrates the structure of the JMS API. A Destination refers to a named physical resource managed by the underlying MOM. It is administered and configured via vendor provided tools, and typically accessed by a user application via the Java Naming and Directory Interface (JNDI) APIs (external to JMS). A MessageProducer will send messages to a destination and a MessageConsumer can receive messages from a destination. The destination can be thought of a mini-message broker or a channel independent of the producers and consumers.

Figure 3. UML diagram of JMS messaging interfaces

JMS supports two different “messaging domains” (unrelated to the DDS domain concept) point-to-point (PtP) and publish-subscribe (Pub/Sub). The two messaging domains are provided to support the wide variety of MOM vendors; only one of them is required to be supported by a JMS provider, although many support both. They provide two different sets of derived classes that extend the common abstract APIs, as shown in Figure 4, below.

Figure 4. The PtP and Pub/Sub JMS domains extend common abstract interfaces, and follow the same programming idioms.

The two JMS messaging domains are similar in every respect, except for the following ways:

1) In PtP messaging domain, only one consumer will receive a message; the policy is not specified by JMS and left up to the vendor. The messages are delivered in the order they are produced (as if put into a shared serial queue). Also, an application can peek ahead using a QueueBrowser.

2) In the PtP messaging domain, the consumers are durable (see below), and therefore don’t have to be running concurrently with the producers to receive messages. This can be achieved in the JMS Pub/Sub messaging domain by using durable subscriptions

A ConnectionFactory refers to vendor provided factory for Connection objects, and is also configured and administered using vendor provided tools, and typically obtained via JNDI APIs. An optional username, and password may be supplied when creating a Connection.

A Connection is a heavy-weight object representing the link between the application and the middleware. Its attributes include a clientID. It provides methods to start() and stop() communication and to close() a connection. An ExceptionListener may be registered with it, to trap lost connections. A Connection is used to create Session objects.

A Session represents a single threaded context for producing and/or consuming data. It provides methods to create Messages, MessageProducers and MessageConsumers. Its attributes include whether it isTransacted and the acknowledgementMode.

In a transacted session, messages are not actually sent (MessageProducer) or the received messages not acknowledged (MessageConsumer) until a commit() operation. A rollback() operation can undo the pending messages to be sent (MessageProducer) or acknowledged (MessageConsumer). The acknowledgementMode determines whether received messages should be automatically acknowledged such that duplicates may (or may not) be received, or whether they must be explicitly acknowledged by the application by calling Message.acknowledge().

A Message is a first class object in JMS; it represents an event, and can carry an optional payload. A message is comprised of headers, optional user defined properties, and an optional user data payload.

The JMS provider automatically assigns most message headers including: destination, delivery mode, message id, timestamp, expiration, redelivery flag, and priority. The user can assign some headers, including: reply to, correlation id, and type.

In addition, the user can associate arbitrary properties consisting of (name, value) pairs. These properties can be used in ‘selectors’, which are expressions specified on a MessageConsumer to sub-select and consume only the matching messages.

JMS defines five message subclasses to conveniently specify the data payload. The message subclasses for unstructured payloads include TextMessage, ByteMessage, and ObjectMessage; and for structured payloads include StreamMessage and MapMessage.

A MessageProducer is used to produce messages. A default destination may be specified when the producer is created; it can also be specified when sending messages. In addition, the delivery mode, priority, and expiration can be specified for the outgoing message headers. A persistent delivery mode means that a message will be delivered once-and-only-once; the message is stored in permanent storage before the send() method returns. A non-persistent delivery mode means that the message will be delivered at most once; a message may be dropped if the JMS provider fails.

A MessageConsumer is used to consume messages from a destination. A selector can be specified when creating a consumer; the consumer will only deliver the messages whose properties match the selector expression. Message can be delivered asynchronously by registering a MessageListener; the onMessage() method will be called when a message arrives. Alternatively, messages can also be received synchronously by calling receive*() methods, the desired timeout (zero, finite, infinite) can be chosen by the user.

A consumer can be durable; for the Pub/Sub messaging domain this is specified by calling Session.createDurableSubscriber() and specifying a subscription name; in the PtP messaging domain, a QueueReceiver is always durable. A durable consumer receives all messages sent to a destination, including ones that are sent when the consumer is inactive. The JMS provider retains a record of the durable consumer(s) and ensures that all messages from the destination’s producers are retained until the durable consumer acknowledges them or they have expired.

A Session can also create unique temporary destinations (TemporaryQueue or a TemporaryTopic), which are like administered destinations except that they are only valid for the duration of the connection and only the consumers associated with the connection can consume the messages. However anyone can produce on the temporary destinations; their presence is typically conveyed to other producers using the Message.setReplyTo() method.

In the DCPS layer of the DDS protocol, there are a number of resemblances, as shown in Figure 5, below. However, there is no DLRL counterpart in JMS.

Figure 5. Mapping of key JMS and DDS concepts and terminology.

Conclusion
DDS and JMS APIs are similar in many respects and correspondences can be observed between the two APIs. For example, a DDS DomainParticipantFactory corresponds to a JMS ConnectionFactory; a DDS DomainParticipant corresponds to a JMS Connection; a DDS Publisher or Subscriber corresponds to a JMS Session; a DDS Topic corresponds to a JMS Destination; a DDS data-object update corresponds to a JMS Message; a corresponds to a DDS DataWriterJMS MessageProducer; a DDS DataReader corresponds to a JMS MessageConsumer.

The similarities make it easy to switch back and forth between the two APIs, and to leverage the experience in one API to another.

Next in Part 2: The differences between DDS and JMS.

Rajive Joshi, Ph.D., is principal engineer at Real-Time Innovations, Inc.

References
1) Data Distribution Service for Real-time Systems, v1.1,

2) J2EE Java Message Service (JMS)

3) RTI Data Distribution Service

Loading comments...