Agnostically Learning Multi-Index Models with Queries

Ilias Diakonikolas; Daniel M. Kane; Vasilis Kontonis; Christos Tzamos; Nikos Zarifis

Agnostically Learning Multi-Index Models with Queries

Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis

Published: 01 Jan 2024, Last Modified: 11 Jan 2025FOCS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study the power of query access for the fundamental task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels of the examples and the goal is to compute a hypothesis that is competitive with the best-fit function in a known class, i.e., it achieves error opt $+\epsilon$ , where opt is the error of the best function in the class. We focus on a general family of Multi-Index Models (MIMs), which are d-variate functions that depend only on few relevant directions, i.e., have the form $g$ (Wx) for an unknown link function $g$ and a $k\times d$ matrix W. Multi-index models cover a wide range of commonly studied function classes, including real-valued function classes such as constant-depth neural networks with ReLU activations, and Boolean concept classes such as intersections of halfspaces. Our main result shows that query access gives significant runtime improvements over random examples for agnostically learning both real-valued and Boolean-valued MIMs. Under standard regularity assumptions for the link function (namely, bounded variation or surface area), we give an agnostic query learner for MIMs with running time $O(k)^{\text{poly}(1/\epsilon}$ ) poly $(d)$ . In contrast, algorithms that rely only on random labeled examples inherently require $d^{\text{poly}(1/\epsilon}$ samples and runtime, even for the basic problem of agnostically learning a single ReLU or a halfspace. As special cases of our general approach, we obtain the following results: •For the class of depth-ℓ, width-S ReLU networks on $\mathbb{R}^{d}$ , our agnostic query learner runs in time poly $(d)2^{\text{poly}(\ell S/\epsilon)}$ . This bound qualitatively matches the runtime of an algorithm by [1] for the realizable PAC setting with random examples. •For the class of arbitrary intersections of $k$ halfspaces on $\mathbb{R}^{d}$ , our agnostic query learner runs in time poly $(d)2^{\text{poly}(\log(k)/\epsilon)}$ . Prior to our work, no improvement over the agnostic PAC model complexity (without queries) was known, even for the case of a single halfspace. In both these settings, we provide evidence that the $2^{\text{poly}(1/\epsilon)}$ runtime dependence is required for proper query learners, even for agnosticallylearning a single ReL U or halfspace. Our algorithmic result establishes a strong computational separation between the agnostic PAC and the agnostic PAC+Query models under the Gaussian distribution for a range of natural function classes. Prior to our work, no such separation was known for any natural concept class - even for the case of a single halfspace, for which it was an open problem posed by Feldman [2]. Our results are enabled by a general dimension-reduction technique that leverages query access to estimate gradients of (a smoothed version of) the underlying label function.

Loading