Sample-Efficient Learning of Mixtures

Hassan Ashtiani, Shai Ben-David, Abbas Mehrabian

2018 (modified: 16 Jul 2019)AAAI 2018Readers: Everyone

Abstract: We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let \mathcal F be an arbitrary class of probability distributions, and let \mathcal{F}^k$ denote the class of k$-mixtures of elements of \mathcal F. Assuming the existence of a method for learning \mathcal F with sample complexity m_{\mathcal{F}}(\epsilon)$, we provide a method for learning \mathcal F^k with sample complexity O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}}). Our mixture learning algorithm has the property that, if the \mathcal F-learner is proper/agnostic, then the \mathcal F^k-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of k$ axis-aligned Gaussians in \mathbb{R}^d$ is PAC-learnable in the agnostic setting with \widetilde{O}({kd}/{\epsilon ^ 4}) samples, which is tight in k$ and d$ up to logarithmic factors. Second, we show that the class of mixtures of k$ Gaussians in \mathbb{R}^d$ is PAC-learnable in the agnostic setting with sample complexity \widetilde{O}({kd^2}/{\epsilon ^ 4}), which improves the previous known bounds of \widetilde{O}({k^3d^2}/{\epsilon ^ 4}) and \widetilde{O}(k^4d^4/\epsilon ^ 2) in its dependence on k$ and d$. Finally, we show that the class of mixtures of k$ log-concave distributions over \mathbb{R}^d$ is PAC-learnable using \widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)$ samples.

0 Replies