Abstract: In this paper we study skyline queries in the distributed computational model, where we have s remote sites and a central coordinator; each site holds a piece of data, and the coordinator wants to compute the skyline of the union of the s datasets. The computation is in terms of rounds, and the goal is to minimize both the total communication cost and the round cost. We first give an algorithm with a small communication cost but potentially a large round cost; we show information-theoretically that the communication cost is optimal even if we allow an infinite number of communication rounds. We next give algorithms with smooth communication-round tradeoffs. We also show a strong lower bound for the communication cost if we can only use one round of communication. Finally, we demonstrate the superiority of our algorithms over existing ones by an extensive set of experiments on both synthetic and real world datasets.
0 Replies
Loading