Average Sensitivity of Euclidean k-ClusteringDownload PDF

Published: 31 Oct 2022, Last Modified: 16 Jan 2023NeurIPS 2022 AcceptReaders: Everyone
Keywords: k-means, clustering, average sensitivity, consistent algorithms, dynamic algorithms
Abstract: Given a set of $n$ points in $\mathbb{R}^d$, the goal of Euclidean $(k,\ell)$-clustering is to find $k$ centers that minimize the sum of the $\ell$-th powers of the Euclidean distance of each point to the closest center. In practical situations, the clustering result must be stable against points missing in the input data so that we can make trustworthy and consistent decisions. To address this issue, we consider the average sensitivity of Euclidean $(k,\ell)$-clustering, which measures the stability of the output in total variation distance against deleting a random point from the input data. We first show that a popular algorithm \textsc{$k$-means++} and its variant called \textsc{$D^\ell$-sampling} have low average sensitivity. Next, we show that any approximation algorithm for Euclidean $(k,\ell)$-clustering can be transformed to an algorithm with low average sensitivity while almost preserving the approximation guarantee. As byproducts of our results, we provide several algorithms for consistent $(k,\ell)$-clustering and dynamic $(k,\ell)$-clustering in the random-order model, where the input points are randomly permuted and given in an online manner. The goal of the consistent setting is to maintain a good solution while minimizing the number of changes to the solution during the process, and that of the dynamic setting is to maintain a good solution while minimizing the (amortized) update time.
TL;DR: Analyzed average sensitivity of k-means and its variants, and derived consistent and dynamic algorithms building on it.
Supplementary Material: pdf
9 Replies