Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?Download PDF

Published: 01 Feb 2023, Last Modified: 24 Feb 2023ICLR 2023 posterReaders: Everyone
Keywords: Neural network, nonparametric regression, minimax optimal
TL;DR: Parallel NN with only weight decay achieves an estimation error close to the minimax rates for both the Besov and BV classes.
Abstract: We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN’s ability to adaptively estimate functions with heterogeneous smoothness — a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a “Parallel NN” variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the ℓp -sparsity (0 < p < 1) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
15 Replies

Loading