On The Implicit Bias of Weight Decay in Shallow Univariate ReLU NetworksDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: theory, implicit bias, generalization, interpolation, theoretical, shallow ReLU networks, ReLU networks, analysis of weight decay
TL;DR: Minimal \ell_2-norm interpolation by univariate scalar one layer ReLU is completely characterized in terms of the convexity of the learned predictor, giving new sharp generalization bounds on 1d Lipschitz functions.
Abstract: We give a complete characterization of the implicit bias of infinitesimal weight decay in the modest setting of univariate one layer ReLU networks. Our main result is a surprisingly simple geometric description of all one layer ReLU networks that exactly fit a dataset $\mathcal D= \set{(x_i,y_i)}$ with the minimum value of the $\ell_2$-norm of the neuron weights. Specifically, we prove that such functions must be either concave or convex between any two consecutive data sites $x_i$ and $x_{i+1}$. Our description implies that interpolating ReLU networks with weak $\ell_2$-regularization achieve the best possible generalization for learning $1d$ Lipschitz functions, up to universal constants.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
12 Replies

Loading