Keywords: Coresets, Mixture of Gaussians, Sketches
TL;DR: Core-sets for any Mixture of Gaussians
Abstract: An $\varepsilon$-coreset for $k$-Gaussian Mixture Models (k-GMMs) of an input set
$P \subseteq \mathbb{R}^d$ of points, is a small weighted set $C \subseteq P$, such that the
negative log-likelihood $L(P, \theta)$ of every $k$-GMM $\theta$ is provably approximated by
$L(C, \theta)$, up to a multiplicative factor of $1 \pm \varepsilon$, for a given $\varepsilon > 0$.
Existing coreset \cite{NIPS11,JMLR18} approximates only ``semi-spherical'' k-GMMs, whose covariance
matrices are similar to the identity matrix. This work provides the first algorithm
that computes a coreset for arbitrarily large k-GMMs. This is by forging new links to projective
clustering and modern techniques in computational geometry. Experimental results on real-world
datasets that demonstrate the efficacy of our approach are also provided.
Primary Area: learning theory
Submission Number: 24830
Loading