Competitively Consistent Clustering

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We give the first algorithms for clustering dynamically changing datasets that are competitive both in terms of the clustering quality, and in terms of the consistency (i.e. how much the solution changes over time).
Abstract: In *fully-dynamic consistent clustering*, we are given a finite metric space $(M,d)$, and a set $F\subseteq M$ of possible locations for opening centers. Data points arrive and depart, and the goal is to maintain an approximately optimal clustering solution at all times while minimizing the *recourse*, the total number of additions/deletions of centers over time. Specifically, we study fully dynamic versions of the classical $k$-center, facility location, and $k$-median problems. We design algorithms that, given a parameter $\beta\geq 1$, maintain an $O(\beta)$-approximate solution at all times, and whose total recourse is bounded by $O(\log |F| \log \Delta) \cdot OPT_{rec}^{\beta}$. Here $OPT_{rec}^{\beta}$ is the minimal recourse of an offline algorithm that maintains a $\beta$-approximate solution at all times, and $\Delta$ is the metric aspect ratio. We obtain our results via a reduction to the recently proposed *Positive Body Chasing* framework of [Bhattacharya Buchbinder Levin Saranurak, FOCS 2023], which we show gives fractional solutions to our clustering problems online. Our contribution is to round these fractional solutions while preserving the approximation and recourse guarantees. We complement our positive results with logarithmic lower bounds which show that our bounds are nearly tight.
Lay Summary: Clustering is a basic primitive of data science where the goal is to summarize a dataset by a small number of representative points. We give new algorithms for clustering that are robust to perturbations of the data over time: in other words, if the data changes only slightly with time, then our data summaries change only slightly with time. Furthermore, we prove that for any dataset and any sequence of updates, our clustering is (almost) as stable as any clustering of the same quality. We are the first to obtain quality/stability tradeoffs of this form.
Primary Area: Theory->Optimization
Keywords: Online Algorithms, Recourse, Clustering, k-center, Facility Location, k-median
Submission Number: 7528
Loading