Build wireless M2M and IoT sensor networks: Data dissemination - Embedded.com

Build wireless M2M and IoT sensor networks: Data dissemination

In the second in a four part series on implementation of wireless sensor networks the authors of “Ad hoc wireless networks,” discuss the various protocols required for reliable and fast data dissemination.

Data dissemination is the process by which queries or data are routed in the sensor network. The data collected by sensor nodes has to be communicated to the BS or to any other node interested in the data. The node that generates data is called a source and the information to be reported is called an event.A node which is interested in an event and seeks information about it is called a sink.

Traffic models have been developed for sensor networks such as the data collection and data dissemination (diffusion) models. In the data collection model, the source sends the data it collects to a collection entity such as the BS. This could be periodic or on demand. The data is processed in the central collection entity.

Data diffusion, on the other hand, consists of a two-step process of interest propagation and data propagation. An interest is a descriptor for a particular kind of data or event that a node is interested in, such as temperature, intrusion, or presence of bio-agents.

For every event that a sink is interested in, it broadcasts its interest to its neighbors and periodically refreshes its interest. The interest is propagated across the network, and every node maintains an interest cache of all events to be reported.

This is similar to a multicast tree formation, rooted at the sink. When an event is detected, it is reported to the interested nodes after referring to the interest cache. Intermediate nodes maintain a data cache and can aggregate the data or modify the rate of reporting data. The paths used for data propagation are modified by preferring the shortest paths and deselecting the weaker or longer paths. The basic idea of diffusion is made efficient and intelligent by different algorithms for interest and data routing.

Flooding
In flooding, each node which receives a packet broadcasts it if the maximum hop-count of the packet is not reached and the node itself is not the destination of the packet. This technique does not require complex topology maintenance or route discovery algorithms. But flooding has the following disadvantages [5]:

Implosion: This is the situation when duplicate messages are sent to the same node. This occurs when a node receives copies of the same message from many of its neighbors.
Overlap: The same event may be sensed by more than one node due to overlapping regions of coverage. This results in their neighbors receiving duplicate reports of the same event.
Resource blindness: The flooding protocol does not consider the available energy at the nodes and results in many redundant transmissions. Hence, it reduces the network lifetime.

Gossiping
Gossiping is a modified version of flooding, where the nodes do not broadcast a packet, but send it to a randomly selected neighbor. This avoids the problem of implosion, but it takes a long time for a message to propagate throughout the network. Though gossiping has considerably lower overhead than flooding, it does not guarantee that all nodes of the network will receive the message. It relies on the random neighbor selection to eventually propagate the message throughout the network.

Rumor Routing
Rumor routing is an agent-based path creation algorithm [6]. Agents, or “ants,” are long-lived entities created at random by nodes. These are basically packets which are circulated in the network to establish shortest paths to events that they encounter.

They can also perform path optimizations at nodes that they visit. When an agent finds a node whose path to an event is longer than its own, it updates the node’s routing table. Figure 12.4 below illustrates the working of the rumor routing algorithm.


Click on image to enlarge.

Figure 12.4. Rumor routing.

In Figure 12.4 (a), the agent has initially recorded a path of distance 2 to event E1. Node A’s table shows that it is at a distance 3 from event E1 and a distance 2 from E2. When the agent visits node A,it updates its own path state information to include the path to event E2. The updating is with one hop greater distance than what it found in A,to account for the hop between any neighbor of A that the agent will visit next, and A.It also optimizes the path to E1recorded at node A to the shorter path through node B.The updated status of the agent and node table is shown in Figure 12.4 (b).

When a query is generated at a sink, it is sent on a random walk with the hope that it will find a path (preestablished by an agent) leading to the required event. This is based on the high probability of two straight lines intersecting on a planar graph, assuming the network topology is like a planar graph, and the paths established can be approximated by straight lines owing to high density of the nodes.

If a query does not find an event path, the sink times out and uses flooding as a last resort to propagate the query. For instance, as in Figure 12.4 (c), suppose a query for event E1 is generated by node P. Through a random walk, it reaches A, where it finds the previously established path to E1. Hence, the query is directed to E1 through node B, as indicated by A’s table.

Sequential assignment routing
One solution to this problem is the use of a set of algorithms which performs organization and mobility management in sensor networks is proposed. This sequential assignment routing (SAR) algorithm creates multiple trees, where the root of each tree is one-hop neighbor of the sink.

Each tree grows outward from the sink and avoids nodes with low throughput or high delay. At the end of the procedure, most nodes belong to multiple trees. An instance of tree formation is illustrated in Figure 12.5 below .

Figure 12.5. Sequential assignment routing.

The trees rooted at A and B, and two of the one-hop neighbors of the sink, are shown. Node C belongs to both trees, and has path lengths of 3 and 5, respectively, to the sink, using the two trees. Each sensor node records two parameters about each path through it: the available energy resources on the path and an additive QoS metric such as delay. This allows a node to choose one path from among many to relay its message to the sink.

The SAR algorithm chooses path with high estimated energy resources, and provisions can be made to accommodate packets of different priorities. A weighted QoS metric is used to handle prioritized packets, which is computed as a product of priority level and delay. The routing ensures that the same weighted QoS metric is maintained.

Thus, higher priority packets take lower delay paths, and lower priority packets have to use the paths of greater delay. For example, if node generates packet of priority 3, it follows the longer path along tree (B ),and packet of priority 5 (higher priority) will follow the shorter path along tree A ,so thatthe priority × delay QoS metric is maintained. SAR minimizes the average weighted QoS metric over the lifetime of the network. The sink periodically triggers metric update to reflect the changes in available energy resource after some transmissions.

Directed Diffusion
The directed diffusion protocol is useful in scenarios where the sensor nodes themselves generate requests/queries for data sensed by other nodes, instead of all queries arising only from a BS. Hence, the sink for the query could be a BS or a sensor node. The directed diffusion routing protocol [8] improves on data diffusion using interest gradients.

Each sensor node names its data with one or more attributes, and other nodes express their interest depending on these attributes. Attribute-value pairs can be usedto describe an interest in intrusion data as follows, where an interest is nothing but a set of descriptors for the data in which the querying node is interested.

type = vehicle               /* detect vehicle location */
interval = 1 s               /* report every 1 second */
rect = [0, 0, 600, 800]      /* query addressed to sensors within this
                             rectangle*/
Timestamp = 02:30:00         /*when the interest originated*/
expiresAt= 03:00:00         */till when the sink retains this data */             

The sink has to periodically refresh its interest if it still requires the data to be reported to it. Data is propagated along the reverse path of the interest propagation. Each path is associated with a gradient that is formed at the time of interest propagation.

While positive gradients encourage the data flow along the path, negative gradients inhibit the distribution of data along a particular path. The strength of the interest is different toward different neighbors, resulting in source-to-sink paths with different gradients.

The gradient corresponding to an interest is derived from the interval/data-rate field specified in the interest. For example, if there are two paths formed with gradients 0.8 and 0.4, the source may send twice as much data along the higher gradient path compared to the lower gradient one. For the interest mentioned earlier, a sensor may send data of the following kind:

type = vehicle        /* type of intrusion seen */
instance = car        /* particular instance of the type */
location = [200,250]  /* location of node */
confidence = 0.80     /* confidence of match */
timestamp = 02:45:20  /* time of detection */

The diffusion model allows nodes to cache or locally transform (aggregate) data. This increases the scalability of communication and reduces the number of message transmissions required.

The concept of reinforcement is used to update a node’s interest along a particular path. For example, suppose the sink wants more frequent updates from the sensors which have detected an event.

It reinforces the path by sending an interest with a higher data-rate requirement, in effect increasing the gradient of that path. On the other hand, if the sink needs only fewer updates, it applies negative reinforcement by sending an interest of lower required data-rate.

The directed diffusion model uses data naming by attributes and local data transformations to reflect the data-centric nature of sensor network operations. The local operations of data aggregation are application-specific. Gradients model the network-wide results of local interactions by regulating the flow of data along different paths, depending on the expressed interest.

Sensor Protocols for Information via Negotiation
A family of protocols called sensor protocols for information via negotiation (SPIN) uses negotiation and resource adaptation to address the deficiencies of flooding. Negotiation reduces overlap and implosion, and a threshold-based resource-aware operation is used to prolong network lifetime. Meta-data, or data describing data, is transmitted instead of raw data.

This requires fewer bytes and can be in an application-specific format. SPIN has three types of messages: ADV, REQ, and DATA. A sensor node broadcasts an ADV containing meta-data describing the actual data. If a neighbor is interested in the data, it sends a REQ for the data.

Then the sensor node sends the actual DATA to the neighbor. The neighbor again sends ADVs to its neighbors and this process continues to disseminate the data throughout the network. This simple version of SPIN is shown in Figure 12.6 below .

Figure 12.6. SPIN protocol.

SPIN is based on data-centric routing, where the nodes advertise the available data through an ADV and wait for requests from interested nodes. SPIN-2 expands on SPIN, using an energy or resource threshold to reduce participation. A node may participate in the ADV-REQ-DATA handshake only if it has suffcient resources above a threshold.

Cost-Field Approach .
The cost-field approach [9] considers the problem of setting up paths to a sink. It is atwo-phase process, the first phase being to set up the cost field, based on metrics such as delay, at all sensor nodes, and the second being data dissemination using the costs. At each node, the cost is defined as the minimum cost from that node to the sink, which occurs along the optimal path. Explicit path information does not need to be maintained.

Phase 1 sets up a cost field starting from the sink node. A sink broadcasts an ADV packet with its own cost as 0. When a node N hears an ADV message from node M ,it setsits ownpath cost to min(LN ,LM + CNM), where LN is the total path cost from node N to sink, LM represents the cost of node M to sink, and CNM is the cost from node N to M .

If LN was updated, the new cost is broadcast through another ADV. This is a flooding-based implementation of the Dijkstra’s algorithm. In order to reduce the high communication costs associated with flooding, a back-off-based approach is used.

The main reason for overhead is that a node broadcasts its updated cost immediately, whether it was the optimal cost or not. Instead, the back-off modification makes a node defer its ADV instead of immediately broadcasting it.. The working of the cost-field approach with back-off is illustrated in Figure 12.7 below.


Clickon image to enlarge.

Figure 12.7. Cost-field approach. .

The numbers on the links indicate link costs. In Figure 12.7 (a), node M broadcasts an ADV, which is received by nodes N and P .They tentatively fix their costs to LM +2 and LM +5, respectively, and set their back-off timers to 20 and 50, respectively. Figure 12.7 (b) shows the costs after 20 time units, when node N ’s back-off timer expires. Node N finalizes its cost to LM +2 and broadcasts an ADV, which is heard by node P .Since LN +1

Phase 2 is the data dissemination process. Once the cost field is established, a source sends its message to sink S with cost CS. The message also contains a cost-so-far field, initially set to 0. Each intermediate node forwards the packet if the cost recorded in the packet plus its own cost equals the original source-to-sink cost. This ensures that the original optimal path is used whenever a packet is routed. While forwarding, the intermediate nodes also update the cost-so-far field.

Geographic Hash Table .
Geographic hash table (GHT) is a system based on data-centric storage [10], inspired by Internet-scale distributed hash table (DHT) systems such as Chord [11] and Tapestry [12]. GHT hashes keys into geographic coordinates and stores a (key, value) pair at the sensor node nearest to the hash value.

The calculated hash value is mapped onto a unique node consistently, so that queries for the data can be routed to the correct node. Stored data is replicated to ensure redundancy in case of node failures, and a consistency protocol is used to maintain the replicated data.

The data is distributed among nodes such that it is scalable and the storage load is balanced. The routing protocol used is greedy perimeter stateless routing (GPSR) [13], which again uses geographical information to route the data and queries.

GHT is more effective in large sensor networks, where a large number of events are detected but not all are queried. In this case, the data observed is stored in a distributed manner across all nodes, instead of being routed to a central external storage. Queries are routed to the nearest node which contains a copy of the relevant data. This makes the storage and traffic distribution uniform.

Small Minimum Energy Communication Network .
Small minimum energy communication network (SMECN) is a protocol proposed in [14] to construct a sub-network from a given communication network. If the entire sensor network is represented by a graph G, the subgraph G’ is constructed such that the energy usage of the network is minimized.

The number of edges in G’ is less than that of G, but all nodes of G are retained in G.The connectivity between any two nodes is not disrupted by the subgraph. G’ is constructed such that the energy required to transmit data from a node to all its neighbors is lower in G’ than in G.

SMECN also follows the minimum energy (ME) property in its subgraph construction, that is, there exists an ME path in subgraph G’ between any two nodes that are connected in G.The power required to transmit data between two nodes u and v is modeled as

where t is a constant, n is the path loss exponent indicating the loss of power with distance from the transmitter, and d (u,v)is the distance between u and v.Let the power needed to receive the data be c. Since the transmission power increases exponentially with distance, it would be more economical to transmit data by smaller hops. Suppose the path between u (i.e., u0)and v (i.e., uk) is represented by r = (u0,u1,…uk), such that each (ui,ui+1) is an edge in the subgraph G, then the total power consumed for the transmission is

The path r is the ME path if C(r) less than or equal to C(r)for all paths r’ between u and v in the graph G. The subgraph G’ is said to have the ME property if there exists a path r in G’ which is an ME path in G, for all node pairs (u,v). SMECN uses only the ME paths from G’ for data transmission, so that the overall energy consumed is minimized.

To read Part 1 , go to: Issues and challenges
Next in Part 3: Sensor data gathering and location detection
Next in Part 4: High quality and low power.

C. Siva Ram Murthy received a B.Tech degree in electronics and communications engineering from the National Institute of Technology, an M.Tech degree in computer engineering from the Indian Institute of Technology, and a Ph.D in computer science from the Indian Institute of Science. He has to his credit over 100 research papers in international journals and conference publications as well as author of several text books on parallel computing, real time systems and networks, and optical networks. His research interest include parallel and distributed computing, real time systems lightwave networks, and, of course, wireless sensor networks.

B.S. Manoj is the recipient of an M.Tech degree in Electronics and Communications engineering from the Institution of Engineers in India. His research interest include ad hoc wireless networks, next generation wireless architectures and wireless sensor networks.

This series of articles is from the book “Ad Hoc Wireless Networks,” by C. Siva Ram Murthy and B.S. Manoj, Copyright 2011, used by permission of PearsonEducation, Inc.. Written permission from Pearson Education, Inc. is required for all other uses.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.