Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression
Keywords: robust statistics, high-dimensional inference, regression, nearly linear time algorithms
TL;DR: We provide algorithms for Gaussian mean estimation and linear regression under Huber contamination that achieve optimal error and sample complexity and have almost linear runtime
Abstract: We study the fundamental problems of Gaussian mean
estimation and linear regression with Gaussian covariates
in the presence of Huber contamination. Our main
contribution is the design of the first sample near-optimal
and almost linear-time algorithms with optimal error
guarantees for both these problems. Specifically, for
Gaussian robust mean estimation on $\mathbb R^d$ with
contamination parameter $\epsilon \in (0, \epsilon_0)$ for a small
absolute constant $\epsilon_0$, we give an
algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$
and almost linear runtime that approximates the target
mean within $\ell_2$-error $O(\epsilon)$.
This improves on
prior work that achieved this error guarantee with
polynomially suboptimal sample and time complexity.
For robust linear
regression, we give the first algorithm with sample
complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear
runtime that approximates the target regressor within
$\ell_2$-error $O(\epsilon)$. This is the first polynomial
sample and time algorithm achieving the optimal error
guarantee, answering an open question in the literature.
At the technical level, we develop a methodology that
yields almost-linear time algorithms for multi-directional
filtering that may be of broader interest.
Submission Number: 8333
Loading