everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
This paper presents two models - called global and local models - of neural-networks applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations. The first model is exact (un-approximated) and global (applicable for arbitrary weights), casting the neural network in reproducing kernel Banach space (RKBS). This leads to a width-independent (under usual scaling) bound on the Rademacher complexity of neural networks in terms of the spectral-norm of the weight matrices, which is depth-independent with mild assumptions. For illustrative purposes we consider how this bound may be applied to untrained networks with LeCun, He and Glorot initialization, discuss their connect to width and depth dependence in the complexity bound, and suggest a modified He initialization that gives a depth-independent complexity bound whp. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) with a well-defined local-intrinsic neural kernel (LiNK). The neural tangent kernel (NTK) is shown to be a first-order approximation of the LiNK, so the local model gives insight into how the NTK model may be generalized outside of the over-parameterized limit. Analogous to the global model, a bound on the Rademacher complexity of network adaptation is obtained from the local model, providing insight into the benefits of network adaptation algorithms such as LoRA. Throughout the paper (a) dense feed-forward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples and to provide insight into their operation and properties.