Agnostic Learning of General ReLU Activation Using Gradient Descent

Pranjal Awasthi; Alex Tang; Aravindan Vijayaraghavan

Agnostic Learning of General ReLU Activation Using Gradient Descent

Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Published: 01 Feb 2023, Last Modified: 02 Mar 2023ICLR 2023 posterReaders: Everyone

Keywords: agnostic learning, learning ReLU, global convergence, learning theory

Abstract: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal i.e., it is guaranteed to achieve an error of $O(OPT)$, where $OPT$ is the error of the best ReLU function. This is a significant improvement over existing guarantees for gradient descent, which only guarantee error of $O(\sqrt{d \cdot OPT})$ even in the zero-bias case (Frei et al., 2020). We also provide finite sample guarantees, and obtain similar guarantees for a broader class of marginal distributions beyond Gaussians.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

TL;DR: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions that achieves loss of O(OPT).

Supplementary Material: zip

11 Replies

Loading