Achieving distributed device situational awareness through cloud-based data management

Sumeet Shendrikar, Real-Time Innovations

May 15, 2012

Sumeet Shendrikar, Real-Time Innovations

Persisting Real-Time Data
Whenever persistent data management is added to a real-time distributed system the primary concern is maintaining the critical performance characteristics of operational technology (OT). the physical equipment-oriented technology implemented and supported by engineering organizations, typical of many device and embedded systems designs.

In any large enterprise, OT is usually done independent from the Information Technology (IT) groups who are involved in data management, including that generated by the embedded devices on the production line or within deployed systems, with little real time interaction between the two.

In such distributed systems, the performance of persistent storage lags behind that of volatile storage, although there are signs that the two may be converging (i.e. Solid State Disks).

Real-time data management consists of several simultaneous activities:

1. Storage (write)

2. Querying, Correlation & Retrieval

3. Distribution

What distinguishes OT (real-time) data management from traditional IT domains is that all these activities happen simultaneously: data is produced, stored, correlated, retrieved, and redistributed with real-time requirements.

In real-time systems, data is produced at various rates and distributed with different priorities. Therefore, it is desirable that the data management system can also prioritize data and be able to scale to handle arbitrary storage loads.

A good example of data produced in a typical real-time distributed system is information from sensors. Sensor data is generally produced at consistent and well-known rates. While usually published with low priority, the same data can quickly become the highest priority -- the urgency of the data is dynamic.

Consider for example a temperature sensor in a car engine. Most of the time the temperature is within the normal operating range and the information can be considered low priority. But when the temperature reaches a specific threshold, it is important to alert the system immediately.

Storage performance fits in two distinct categories -- complete and partial. Complete storage is achieved if the data management system can store data at the peak throughput rate of your distributed system. For partial storage the system designer is left with two choices:

1. Slow down the data producers

2. Selectively discard data

Note that simply buffering is not sufficient as any buffer is finite. Buffering simply postpones the inevitable, and is undesirable in real-time systems.

Due to the distributed implementation, NoSQL database write performance is affected by the replication strategy as well as the underlying hardware. It is important for the system designer to understand the database implementation and pick one that is suitable for the application. As an example, one of the main strengths of Apache's Cassandra is the good write performance that stems from a very efficient replication strategy [Perham, 2010].

Archiving Service
An archiving service provides the best data storage by subscribing to real-time data with the appropriate Quality of Service (QoS). In a basic implementation, the archiving service uses the NoSQL database API to issue a write to any node in the cloud. From there, the NoSQL database implementation persists and replicates the data. Based on the consistency configuration, the database will notify the archiving service when the desired consistency has been achieved.

A more advanced archiving service implementation can load balance writes to different segments of the cloud to achieve optimal write throughput. The archiving service can detect when it cannot provide complete data storage, and scale cloud resources accordingly.

The ability to subscribe without disrupting the real-time system is a fundamental characteristic of the archiving service. OT systems are extremely time sensitive; any delays in the delivery of data can result in system failure. Though subscribing to data may seem trivial and non-intrusive, traditional corporate Information Technology (IT) systems will often sacrifice latency to ensure receipt of all data.

This balancing act is a common challenge when integrating operational technologies with storage and other common IT systems. To ensure non-intrusive subscriptions, data distribution must enable passive observations without slowing producers or any other data transmission.

< Previous
Page 2 of 4
Next >

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER