Deterministic Data Reduction in Sensor Networks
Written with Hüseyin Akcan.
Abstract:
A wide range of mining and analysis problems involve extracting
knowledge from count data. Such data typically
arises from transactional data sets; here we consider the case where it
arises from a highly distributed source such
as a sensor network. A general approach that scales well with the data
is sampling, and we have proposed several
deterministic streaming algorithms for efficiently reducing such data.
Those algorithms perform with significantly
better accuracy than random sampling for problems such as frequency
estimation, correlation detection, association
rules and iceberg cube computations. In this paper, we consider the
distributed version of the problem, and specifically
the case when the data originates from a sensor network. We engineer a
fully-distributed version of our algorithm
which builds a deterministic sample along some tree-like aggregation
structure. We demonstrate that this distributed
sample has about the same quality as would have been computed by running
our (non-distributed) deterministic
algorithm on the underlying data. We compare to other
(non-deterministic) sampling algorithms while focusing on
issues such as sample quality, network longevity, and energy and
communication costs.
Related publications
Copyright © 2005, Hervé Brönnimann, hbr@poly.edu