Statistical Query Lower Bounds for List-Decodable Linear Regression

21 May 2021, 20:50 (modified: 15 Jan 2022, 06:03)NeurIPS 2021 SpotlightReaders: Everyone
Keywords: learning theory, high-dimensional robust statistics, statistical query model, list-decodable learning, linear regression
TL;DR: We prove a superpolynomial statistical query lower bound for the problem of learning the regression vector of a Gaussian linear model when outliers constitute the majority of the dataset.
Abstract: We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0< \alpha <1/2$ such that an $\alpha$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-\alpha)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{\mathrm{poly}(1/\alpha)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
10 Replies