Bregman Clustering for Separable Instances

Published: 01 Jan 2010, Last Modified: 24 Jul 2025SWAT 2010EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Bregman k-median problem is defined as follows. Given a Bregman divergence D φ and a finite set \(P \subseteq {\mathbb R}^d\) of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ p ∈ P min c ∈ C D φ (p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA ’07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability \(2^{-{\mathcal O}(k)}\) this seeding step alone finds an \({\mathcal O}(1)\)-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS ’06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most \(2^{{\mathcal O}(k)}n\) arithmetic operations, including evaluations of Bregman divergence D φ .
Loading