Keywords: differential privacy, distribution learning, mixtures of Gaussians
TL;DR: We prove sample complexity bounds for learning mixtures of axis-aligned Gaussians under approximate differential privacy.
Abstract: We consider the problem of learning multivariate Gaussians under the constraint of approximate differential privacy. We prove that $\widetilde{O}(k^2 d \log^{3/2}(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ to within total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that $\widetilde{O}(kd/\alpha^2 + kd \log(1/\delta) / \alpha \varepsilon)$ samples are sufficient.
To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions $\mathcal{F}$ is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from $f \in \mathcal{F}$, outputs a list of distributions one of which approximates $f$. We show that if $\mathcal{F}$ is privately list-decodable then we can learn mixtures of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
7 Replies
Loading