Keywords: clustering, $k$-median, coresets, uniform sampling, $\ell_1$ metric
TL;DR: We formulate "stable coresets", an intermediate definition between strong and weak coresets that provides strong theoretical guarantees while remaining compatible with uniform sampling, and demonstrating our approach on the median problem.
Abstract: Uniform sampling is a highly efficient method for data summarization.
However, its effectiveness in producing coresets for clustering problems
is not yet well understood,
primarily because it generally does not yield a strong coreset,
which is the prevailing notion in the literature.
We formulate \emph{stable coresets}, a notion that
is intermediate between the standard notions of weak and strong coresets,
and effectively combines the broad applicability of strong coresets
with highly efficient constructions, through uniform sampling, of weak coresets.
Our main result is that a uniform sample of size $O(\epsilon^{-2}\log d)$
yields, with high constant probability,
a stable coreset for $1$-median in $\mathbb{R}^d$ under the $\ell_1$ metric.
We then leverage the powerful properties of stable coresets
to easily derive new coreset constructions, all through uniform sampling,
for $\ell_1$ and related metrics, such as Kendall-tau and Jaccard.
We also show applications to fair clustering and to approximation algorithms
for $k$-median problems in these metric spaces.
Our experiments validate the benefits of stable coresets in practice,
in terms of both construction time and approximation quality.
Primary Area: learning theory
Submission Number: 18801
Loading