Distributed expectation-based spatio-temporal cluster detection for pocket switched networks

Introduction - Pocket Switched Networks

  • Pocket Switched Networks (PSNs) are created using short range, ad hoc wireless links between personal mobile devices.

Introduction - Pocket Switched Networks

  • Pocket Switched Networks (PSNs) are created using short range, ad hoc wireless links between personal mobile devices.
  • There is no other network infrastructure (No Internet!).

Introduction - Pocket Switched Networks

  • Pocket Switched Networks (PSNs) are created using short range, ad hoc wireless links between personal mobile devices.
  • There is no other network infrastructure (No Internet!).
  • Links rely on human mobility, and are unreliable. End-to-end paths change constantly, if they exist at all.

Introduction - Pocket Switched Networks

  • Pocket Switched Networks (PSNs) are created using short range, ad hoc wireless links between personal mobile devices.
  • There is no other network infrastructure (No Internet!).
  • Links rely on human mobility, and are unreliable. End-to-end paths change constantly, if they exist at all.
  • An example of Disruption-Tolerant Networking (DTN).

What's the point?

  • Over 1 billion smartphones worldwide - October 2012.

What's the point?

  • Over 1 billion smartphones worldwide - October 2012.
  • Bluetooth/Wi-Fi LP should make opportunistic message delivery between neighbours possible.
    • Devices can use the "store-wait-forward" paradigm in DTNs.

What's the point?

  • Over 1 billion smartphones worldwide - October 2012.
  • Bluetooth/Wi-Fi LP should make opportunistic message delivery between neighbours possible.
    • Devices can use the "store-wait-forward" paradigm in DTNs.
  • Possible new applications for smartphones.
    1. "Distributed social networking (DSN) software allows you to store and organize your own content and - optionally - share it with whoever you like." [Werdmuller].
    2. Free (but slow) status updates over multiple hops.

PSN use case 1 - DSN


PSN use case 1 - DSN


PSN use case 1 - DSN


PSN use case 2 - Targeted updates



Clusters

  • Clusters can be used for opportunistic data delivery when the probability of a device being able to deliver a packet is unknown or difficult to calculate.

Clusters

  • Clusters can be used for opportunistic data delivery when the probability of a device being able to deliver a packet is unknown or difficult to calculate.
  • Based on the assumption that "people belong to their cluster groups and occasionally visit other groups" [Hui].

Clusters

  • Clusters can be used for opportunistic data delivery when the probability of a device being able to deliver a packet is unknown or difficult to calculate.
  • Based on the assumption that "people belong to their cluster groups and occasionally visit other groups" [Hui].

Clusters

  • Clusters can be used for opportunistic data delivery when the probability of a device being able to deliver a packet is unknown or difficult to calculate.
  • Based on the assumption that "people belong to their cluster groups and occasionally visit other groups" [Hui].

Clusters

  • Clusters can be used for opportunistic data delivery when the probability of a device being able to deliver a packet is unknown or difficult to calculate.
  • Based on the assumption that "people belong to their cluster groups and occasionally visit other groups" [Hui].

How are clusters formed?

  • If social ties are known in advance, or can be gathered from online social networks, then we can label clusters [Hui, Bigwood].

How are clusters formed?

  • If social ties are known in advance, or can be gathered from online social networks, then we can label clusters [Hui, Bigwood].
  • Clusters can be detected automatically using distributed cluster detection methods:
    • Epidemic label propagation [Leung, Herbiet].
    • Simple, Modularity [Hui].
    • AD-Simple [Borgia].
    • Quality [Orlinski].

How are clusters formed?

  • If social ties are known in advance, or can be gathered from online social networks, then we can label clusters [Hui, Bigwood].
  • Clusters can be detected automatically using distributed cluster detection methods:
    • Epidemic label propagation [Leung, Herbiet].
    • Simple, Modularity [Hui].
    • AD-Simple [Borgia].
    • Quality [Orlinski].
  • We can experiment with these methods using Reality Mining data-sets which contain encounters between humans.

But there's a problem ...

In all of these methods, mean cluster size increases over time. In most cases cluster size increases monotonically.

Example cluster sizes over time using the Simple [Hui] cluster detection method.
Lambda governs cluster growth.

Cluster size

  • Cluster based message delivery methods copy messages to devices in the same cluster as the destination.

Cluster size

  • Cluster based message delivery methods copy messages to devices in the same cluster as the destination.
  • Thus larger clusters can lead to more message duplication.

Why do clusters grow?

Graphs with edges formed from encounters in the reality mining data-sets densify with a Densification Power Law [Leskovec] of around 1.25.

Number of edges growing with number of connected devices in the Cambridge data-set.

Some more data

  • Activity within the data-sets is "bursty".
  • Each new burst contains encounters between devices which have not met before, and may never meet again.

Some more data

  • Activity within the data-sets is "bursty".
  • Each new burst contains encounters between devices which have not met before, and may never meet again.
Monotonically increasing cluster sizes means that obsolete cluster membership persists.
Could clusters better reflect current network conditions instead of increasing in size monotonically over time?

Distributed expectation-based spatio-temporal clustering (DEBT)

Is split into two parts:

  1. Each device attempts to identify recent and current encounters with other devices which have lasted longer than expected.

Distributed expectation-based spatio-temporal clustering (DEBT)

Is split into two parts:

  1. Each device attempts to identify recent and current encounters with other devices which have lasted longer than expected.
  2. Spatio-temporal clusters are then created.

Calulating baselines

A rough estimation of the expected cumulative duration of encounters with a device during a discrete time frame.



  • Returns an estimated mean relative to some variation in a time series.

Calulating baselines

A rough estimation of the expected cumulative duration of encounters with a device during a discrete time frame.



  • Returns an estimated mean relative to some variation in a time series.
  • We don't have to store all the data of previous encounters, only mean values for the past w frames.

Calulating baselines

A rough estimation of the expected cumulative duration of encounters with a device during a discrete time frame.



  • Returns an estimated mean relative to some variation in a time series.
  • We don't have to store all the data of previous encounters, only mean values for the past w frames.
  • Time frames help us to measure connectivity despite the parking lot problem.

Building clusters

Test for including dj in di's local cluster



  • x is the cumulative encounter time between 2 devices during a time frame.

Building clusters

Test for including dj in di's local cluster



  • x is the cumulative encounter time between 2 devices during a time frame.
  • Cup was added to control the size of clusers if need be.

Building clusters

Test for including dj in di's local cluster



  • x is the cumulative encounter time between 2 devices during a time frame.
  • Cup was added to control the size of clusers if need be.
  • Removing devices from clusters is done at the end of time frames using the same test but in reverse.

Local cluster tables

  • Devices are added and removed from local cluster tables using the previous tests.

Local cluster tables

  • Devices are added and removed from local cluster tables using the previous tests.
  • Local cluster tables are exchanged when devices meet and when the cluster inclusion test has been passed.

Local cluster tables

  • Devices are added and removed from local cluster tables using the previous tests.
  • Local cluster tables are exchanged when devices meet and when the cluster inclusion test has been passed.
  • Local cluster tables of neighbours are stored in the branch column.
Local cluster table example for di.
NeighboursBranch
dj{di, dk}
dk{dj, dl}

Resulting clusters

Examble using the Haggle Cambridge data-set.

Spatial
vs.
Spatio-temporal

Data delivery

  • Three policies for transfering a messages upon encountering a device were tried.
  • To explain them we'll use the example of di trying to send a message to dl.

DEBT Epidemic (DEBTE)

  • If destination (e.g. dl) is anywhere in the local cluster table of encountered device (e.g. dj), transfer a copy of the message.

DEBT Epidemic (DEBTE)

  • If destination (e.g. dl) is anywhere in the local cluster table of encountered device (e.g. dj), transfer a copy of the message.
Local cluster table example for dj.
NeighboursBranch
di{dj, dk}
dk{di, dj, dl}

DEBT Epidemic (DEBTE)

  • If destination (e.g. dl) is anywhere in the local cluster table of encountered device (e.g. dj), transfer a copy of the message.
Local cluster table example for dj.
NeighboursBranch
di{dj, dk}
dk{di, dj, dl}

DEBT Cluster (DEBTC)

  • If destination (e.g. dl) is in a row of the local cluster of the encountered device (e.g. dj) which also contains the transfering device (e.g. di), do not transfer the message.

DEBT Cluster (DEBTC)

  • If destination (e.g. dl) is in a row of the local cluster of the encountered device (e.g. dj) which also contains the transfering device (e.g. di), do not transfer the message.
Local cluster table example for dj.
NeighboursBranch
di{dj, dk}
dk{di, dj, dl}

DEBT Cluster (DEBTC)

  • If destination (e.g. dl) is in a row of the local cluster of the encountered device (e.g. dj) which also contains the transfering device (e.g. di), do not transfer the message.
Local cluster table example for dj.
NeighboursBranch
di{dj, dk}
dk{di, dj, dl}

DEBT Tree (DEBTT)

  • If a path to destination exists which does not appear to contain a routing loop, transfer a copy of the message.

DEBT Tree (DEBTT)

  • If a path to destination exists which does not appear to contain a routing loop, transfer a copy of the message.
Local cluster table example for dj.
NeighboursBranch
didi→dk→dl
di→dk→dj
di→dj
dkdk→dl
dk→di
dk→dj

DEBT Tree (DEBTT)

  • If a path to destination exists which does not appear to contain a routing loop, transfer a copy of the message.
Local cluster table example for dj.
NeighboursBranch
didi→dk→dl
di→dk→dj
di→dj
dkdk→dl
dk→di
dk→dj
Data dissemination results.

Results

Data delivery

Efficiency - Cambridge

Efficiency is calculated daily by dividing the number of delivered packets by the number of relayed packets.

Cambridge data-set

Efficiency - Reality

Efficiency is calculated daily by dividing the number of delivered packets by the number of relayed packets.

Reality data-set

Combining all end results


MethodData DeliveryOverheads
BUBBLE0.114120.7
Quality0.171756.38
DEBTE0.135628.92
DEBTC0.10612.47
DEBTT0.121319.15
  • Overheads = (Number of messages relayed - number of messages delivered) / number of messages delivered.

Combining all end results


MethodData DeliveryOverheads
BUBBLE0.114120.7
Quality0.171756.38
DEBTE0.135628.92
DEBTC0.10612.47
DEBTT0.121319.15
  • Overheads = (Number of messages relayed - number of messages delivered) / number of messages delivered.
  • 6% more packets delivered with DEBTT than Bubble, with 7% drop in message overheads.

Conclusions

  1. DEBTT looks the most promising.
    • Given a little in processing overheads associated with tree traversal, but gained efficiency.

Conclusions

  1. DEBTT looks the most promising.
    • Given a little in processing overheads associated with tree traversal, but gained efficiency.
  2. DEBTT is not more complex than Bubble which uses centrality calculations as well as clustering.

Conclusions

  1. DEBTT looks the most promising.
    • Given a little in processing overheads associated with tree traversal, but gained efficiency.
  2. DEBTT is not more complex than Bubble which uses centrality calculations as well as clustering.
  3. Automatic frame size detection would be great.
    • We need frames or some other mechanism to solve the parking lot problem.
    • Human movement is mostly diurnal so it doesn't make sense to have frames longer than 24 hours.
    • If frames size is less than 1 hour, every encounter becomes significant.

Thank you