Learning-Augmented $k$-means Clustering

Jon C. Ergun; Zhili Feng; Sandeep Silwal; David Woodruff; Samson Zhou

Learning-Augmented $k$-means Clustering

Jon C. Ergun, Zhili Feng, Sandeep Silwal, David Woodruff, Samson Zhou

Published: 28 Jan 2022, Last Modified: 04 May 2025ICLR 2022 SpotlightReaders: Everyone

Keywords: clustering, learning-augmented algorithms

Abstract: $k$-means clustering is a well-studied problem due to its wide applicability. Unfortunately, there exist strong theoretical limits on the performance of any algorithm for the $k$-means problem on worst-case inputs. To overcome this barrier, we consider a scenario where ``advice'' is provided to help perform clustering. Specifically, we consider the $k$-means problem augmented with a predictor that, given any point, returns its cluster label in an approximately optimal clustering up to some, possibly adversarial, error. We present an algorithm whose performance improves along with the accuracy of the predictor, even though na\"{i}vely following the accurate predictor can still lead to a high clustering cost. Thus if the predictor is sufficiently accurate, we can retrieve a close to optimal clustering with nearly optimal runtime, breaking known computational barriers for algorithms that do not have access to such advice. We evaluate our algorithms on real datasets and show significant improvements in the quality of clustering.

One-sentence Summary: We study the $k$-means problem augmented with a learning-based predictor that gives noisy information about true labels.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/learning-augmented-k-means-clustering/code)

5 Replies

Loading