Improved Bounds for Agnostic Active Learning of Single Index Models

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX
Keywords: active learning, leverage scores, single index models, scientific computing
Abstract: We study active learning for single index models of the form $F({\mathbf{x}}) = f(\langle {\mathbf{w}}, {\mathbf{x}}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf{x},\mathbf{w}} \in \mathbb{R}^d$. Such functions are important in scientific computing, where they are used to construct surrogate models for partial differential equations (PDEs) and to approximate high-dimensional Quantities of Interest. In these applications, collecting function samples requires solving a partial differential equation, so sample-efficient active learning methods translate to reduced computational cost. Our work provides two main results. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via \emph{statistical leverage score sampling} are sufficient to find an optimal single index model for a given target function, even in the challenging and practically important agnostic (adversarial noise) setting. This result is optimal up to logarithmic factors and improves quadratically on a recent $\tilde{O}(d^{2})$ bound of \citet{gajjar2023active}. Second, we show that $\tilde{O}(d^{3/2})$ samples suffice in the more difficult non-parametric setting when $f$ is \emph{unknown}, which is the also best result known in this general setting.
Submission Number: 49