Robust Mean Estimation Without Moments for Symmetric Distributions

Gleb Novikov; David Steurer; Stefan Tiegel

Robust Mean Estimation Without Moments for Symmetric Distributions

Gleb Novikov, David Steurer, Stefan Tiegel

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Robust Mean Estimation, Unbounded First Moment, Symmetric Distributions (Spherical, Elliptical, Product), Filtering Algorithm, Huber Loss

TL;DR: We give polynomial-time and in some cases nearly optimal algorithms for robustly estimating the location parameter of symmetric distributions.

Abstract: We study the problem of robustly estimating the mean or location parameter without moment assumptions. Known computationally efficient algorithms rely on strong distributional assumptions, such as sub-Gaussianity, or (certifiably) bounded moments. Moreover, the guarantees that they achieve in the heavy-tailed setting are weaker than those for sub-Gaussian distributions with known covariance. In this work, we show that such a tradeoff, between error guarantees and heavy-tails, is not necessary for symmetric distributions. We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distributions, a vast generalization of the Gaussian distribution. For product distributions and elliptical distributions with known scatter (covariance) matrix, we show that given an $\varepsilon$-corrupted sample, we can with probability at least $1-\delta$ estimate its location up to error $O(\varepsilon \sqrt{\log(1/\varepsilon)})$ using $\tfrac{d\log(d) + \log(1/\delta)}{\varepsilon^2 \log(1/\varepsilon)}$ samples. This result matches the best-known guarantees for the Gaussian distribution and known SQ lower bounds (up to the $\log(d)$ factor). For elliptical distributions with unknown scatter (covariance) matrix, we propose a sequence of efficient algorithms that approaches this optimal error. Specifically, for every $k \in \mathbb{N}$, we design an estimator using time and samples $\tilde{O}({d^k})$ achieving error $O(\varepsilon^{1-\frac{1}{2k}})$. This matches the error and running time guarantees when assuming certifiably bounded moments of order up to $k$. For unknown covariance, such error bounds of $o(\sqrt{\varepsilon})$ are not even known for (general) sub-Gaussian distributions. Our algorithms are based on a generalization of the well-known filtering technique [DK22]. More specifically, we show how this machinery can be combined with Huber-loss-based techniques to work with projections of the noise that behave more nicely than the initial noise. Moreover, we show how sum-of-squares proofs can be used to obtain algorithmic guarantees even for distributions without a first moment. We believe that this approach may find other applications in future works.

Supplementary Material: pdf

Submission Number: 13072

Loading