Information-Computation Tradeoffs  for Noiseless Linear Regression with Oblivious Contamination

Ilias Diakonikolas; Chao Gao; Daniel Kane; John Lafferty; Ankit Pensia

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Ilias Diakonikolas, Chao Gao, Daniel Kane, John Lafferty, Ankit Pensia

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: learning theory, information computation tradeoff, robust statistics, statistical query algorithms

Abstract: We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb R^d \times \mathbb R$ with $x \sim \mathcal N(0,I_d)$ and $y = x^\top \beta + z$, where $z$ is drawn from an unknown distribution that is independent of $x$. Moreover, $z$ satisfies $\mathbb P[z = 0] = \alpha>0$. The goal is to accurately recover the regressor $\beta$ to small $\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/\alpha)$ samples. On the other hand, the best known polynomial-time algorithms require $\Omega(d/\alpha^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/\alpha$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 23352

Loading