Deterministic Data Reduction in Sensor Networks

Written with Hüseyin Akcan.
Abstract: A wide range of mining and analysis problems involve extracting knowledge from count data. Such data typically arises from transactional data sets; here we consider the case where it arises from a highly distributed source such as a sensor network. A general approach that scales well with the data is sampling, and we have proposed several deterministic streaming algorithms for efficiently reducing such data. Those algorithms perform with significantly better accuracy than random sampling for problems such as frequency estimation, correlation detection, association rules and iceberg cube computations. In this paper, we consider the distributed version of the problem, and specifically the case when the data originates from a sensor network. We engineer a fully-distributed version of our algorithm which builds a deterministic sample along some tree-like aggregation structure. We demonstrate that this distributed sample has about the same quality as would have been computed by running our (non-distributed) deterministic algorithm on the underlying data. We compare to other (non-deterministic) sampling algorithms while focusing on issues such as sample quality, network longevity, and energy and communication costs.
Energyy usage in sensor network

Related publications



Copyright © 2005, Hervé Brönnimann, hbr@poly.edu