Probabilistic k-Median Clustering in Data Streams

Published: 2015, Last Modified: 15 May 2024Theory Comput. Syst. 2015EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The focus of our work is introducing and constructing probabilistic coresets. A probabilistic coreset can contain probabilistic points, and the number of these points should be polylogarithmic in the input size. However, the overall storage size is also influenced by representation size of the propability distribution of each point. So, our first observation is that the size of probabilistic coresets shall be restricted in the number of points and in the representation size of the points. We propose the first (k, ε)-coreset constructions for the probabilistic k-median problem in the metric and Euclidean case. The coresets are of size poly(ε −1, k, log(W/(p min⋅δ))), where W is the expected total weight of the weighted probabilistic input points when all weights are scaled to be at least one, p min is the probability of a point to be realized at a certain location, and δ is the error probability of the construction. Our coreset for the Euclidean problem can be maintained in data streams.
Loading