Coresets for Mixtures of (arbitrarily large) Gaussians

Coresets for Mixtures of (arbitrarily large) Gaussians

ICLR 2026 Conference Submission24830 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Coresets, Mixture of Gaussians, Sketches

TL;DR: Core-sets for any Mixture of Gaussians

Abstract: An $\varepsilon$-coreset for $k$-Gaussian Mixture Models (k-GMMs) of an input set $P \subseteq \mathbb{R}^d$ of points, is a small weighted set $C \subseteq P$, such that the negative log-likelihood $L(P, \theta)$ of every $k$-GMM $\theta$ is provably approximated by $L(C, \theta)$, up to a multiplicative factor of $1 \pm \varepsilon$, for a given $\varepsilon > 0$. Existing coreset \cite{NIPS11,JMLR18} approximates only ``semi-spherical'' k-GMMs, whose covariance matrices are similar to the identity matrix. This work provides the first algorithm that computes a coreset for arbitrarily large k-GMMs. This is by forging new links to projective clustering and modern techniques in computational geometry. Experimental results on real-world datasets that demonstrate the efficacy of our approach are also provided.

Primary Area: learning theory

Submission Number: 24830

Loading