Communication-Efficient Computation on Distributed Noisy Datasets

Published: 2015, Last Modified: 10 Jan 2025SPAA 2015EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper gives a first attempt to answer the following general question: Given a set of machines connected by a point-to-point communication network, each having a {\em noisy} dataset, how can we perform communication-efficient statistical estimations on the union of these datasets? Here `noisy' means that a real-world entity may appear in different forms in different datasets, but those variants should be considered as the same universe element when performing statistical estimations. We give a first set of communication-efficient solutions for statistical estimations on distributed noisy datasets, including algorithms for distinct elements, $L_0$-sampling, heavy hitters, frequency moments and empirical entropy.
Loading