Agnostically Learning Halfspaces

Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, Rocco A. Servedio

2008 (modified: 17 May 2023)SIAM J. Comput. 2008Readers: Everyone

Abstract: We give a computationally efficient algorithm that learns (under distributional assumptions) a halfspace in the difficult agnostic framework of Kearns, Schapire, and Sellie [Mach. Learn., 17 (1994), pp. 115–141], where a learner is given access to a distribution on labelled examples but where the labelling may be arbitrary (similar to malicious noise). It constructs a hypothesis whose error rate on future examples is within an additive $\epsilon$ of the optimal halfspace, in time poly$(n)$ for any constant $\epsilon>0$, for the uniform distribution over $\{-1,1\}^n$ or unit sphere in $\mathbb R^n,$ as well as any log-concave distribution in $\mathbb R^n$. It also agnostically learns Boolean disjunctions in time $2^{\tilde{O}(\sqrt{n})}$ with respect to any distribution. Our algorithm, which performs $L_1$ polynomial regression, is a natural noise-tolerant arbitrary-distribution generalization of the well-known “low-degree” Fourier algorithm of Linial, Mansour, and Nisan. We observe that significant improvements on the running time of our algorithm would yield the fastest known algorithm for learning parity with noise, a challenging open problem in computational learning theory.

0 Replies