Novel Kernel Models and Uniform Convergence Bounds for Neural Networks Beyond the Over-Parameterized Regime

Alistair Shilton; Sunil Gupta; Santu Rana; Svetha Venkatesh

Novel Kernel Models and Uniform Convergence Bounds for Neural Networks Beyond the Over-Parameterized Regime

Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh

26 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: uniform convergence, reproducing kernel banach space, reproducing kernel hilbert space, relu, resnet

TL;DR: We construct two exact models for neural networks - one for the network as a whole, the other for the change during training - and use them to derive non-vacuous, well-behaved bounds on Rademacher complexity.

Abstract:

This paper presents two models - called global and local models - of neural-networks applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations. The first model is exact (un-approximated) and global (applicable for arbitrary weights), casting the neural network in reproducing kernel Banach space (RKBS). This leads to a width-independent (under usual scaling) bound on the Rademacher complexity of neural networks in terms of the spectral-norm of the weight matrices, which is depth-independent with mild assumptions. For illustrative purposes we consider how this bound may be applied to untrained networks with LeCun, He and Glorot initialization, discuss their connect to width and depth dependence in the complexity bound, and suggest a modified He initialization that gives a depth-independent complexity bound whp. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) with a well-defined local-intrinsic neural kernel (LiNK). The neural tangent kernel (NTK) is shown to be a first-order approximation of the LiNK, so the local model gives insight into how the NTK model may be generalized outside of the over-parameterized limit. Analogous to the global model, a bound on the Rademacher complexity of network adaptation is obtained from the local model, providing insight into the benefits of network adaptation algorithms such as LoRA. Throughout the paper (a) dense feed-forward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples and to provide insight into their operation and properties.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6096

Loading