Reduced Label Complexity For Tight ℓ2 Regression

Alex Gittens, Malik Magdon-Ismail

Published: 05 May 2023, Last Modified: 06 Feb 2024arXivEveryoneCC BY-NC-SA 4.0

Abstract: Given data ${\rm X}\in\mathbb{R}^{n\times d}$ and labels $y \in\mathbb{R}^{n}$ the goal is find $\w\in\mathbb{R}^d$ to minimize $\|{\rm X}\w - y\|_2^2$. We give a polynomial algorithm that, \emph{oblivious to $y$}, throws out $n/(d+\sqrt{n})$ data points and is a $(1+d/n)$-approximation to optimal in expectation. The motivation is tight approximation with reduced label complexity (number of labels revealed). We reduce label complexity by $\Omega(\sqrt{n})$. Open question: Can label complexity be reduced by $\Omega(n)$ with tight $(1+d/n)$-approximation?