Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

Yinan Li, Chicheng Zhang

Published: 01 Jul 2024, Last Modified: 06 Oct 2024AISTATS 2024EveryoneCC BY 4.0

Abstract: We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise (Tsybakov, 2004) under structured unlabeled data distributions. Inspired by Diakonikolas et al.\,(2020c), we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}\left(d (\frac1\epsilon)^{\left(\frac{8-6\alpha}{3\alpha-1}\right)}\right)$ under the assumption that the Tsybakov noise parameter $\alpha \in \left(\frac{1}{3}, 1\right)$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms (Diakonikolas et al., 2020b; Zhang and Li, 2021) and the information-theoretic lower bound in this setting.