List-Decodable Sparse Mean Estimation via Difference-of-Pairs FilteringDownload PDF

Published: 31 Oct 2022, 18:00, Last Modified: 12 Oct 2022, 13:40NeurIPS 2022 AcceptReaders: Everyone
Keywords: list-decoding, sparse estimation, robust statistics, high-dimensional inference
TL;DR: We develop a novel and simple technique for list-decodable mean estimation and use it to obtain the first efficient algorithm for the problem in the sparse setting.
Abstract: We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $\alpha \in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor \alpha m \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $\mu$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector $\hat \mu$ such that $\|\hat \mu - \mu\|_2$ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with ``certifiably bounded'' $t$-th moments in $k$-sparse directions and sufficiently light tails, our algorithm achieves error of $(1/\alpha)^{O(1/t)}$ with sample complexity $m = (k\log(n))^{O(t)}/\alpha$ and running time $\mathrm{poly}(mn^t)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee $\Theta (\sqrt{\log(1/\alpha)})$ with quasi-polynomial complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.
Supplementary Material: pdf
11 Replies