Keywords: learning theory, information computation tradeoff, robust statistics, statistical query algorithms
Abstract: We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples
from a distribution $(x, y)$ on $\mathbb R^d \times \mathbb R$
with $x \sim \mathcal N(0,I_d)$ and $y = x^\top \beta + z$,
where $z$ is drawn from an unknown distribution
that is independent of $x$.
Moreover, $z$ satisfies $\mathbb P[z = 0] = \alpha>0$.
The goal is to accurately recover the regressor
$\beta$ to small $\ell_2$-error.
Ignoring computational considerations, this problem
is known to be solvable using $O(d/\alpha)$ samples.
On the other hand, the best known polynomial-time algorithms
require $\Omega(d/\alpha^2)$ samples. Here we provide formal
evidence that the quadratic dependence in $1/\alpha$ is
inherent for efficient algorithms. Specifically, we show
that any efficient Statistical Query algorithm
for this task requires VSTAT complexity
at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 23352
Loading