Skip Connections and Generalization: A PAC-Bayesian Perspective

ICLR 2026 Conference Submission21734 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Network Architecture, Deep Learning Theory, Generalization
Abstract: Skip connections are a hallmark of modern deep neural networks, yet their effect on generalization remains elusive. We present a PAC-Bayesian analysis showing that skip connections reduce cross-layer correlations in the weight distribution, which directly tightens generalization bounds. Our approach models weight matrices with matrix normal distributions, capturing row- and column-wise dependencies, and reveals that skip connections effectively suppress correlation terms dominating the KL divergence. This view aligns with the Laplace approximation perspective, where skip connections encourage flatter minima and more dispersed posteriors. Empirical results on multilayer perceptrons and convolutional networks with varying skip configurations support our theory: reductions in cross-layer correlation consistently coincide with improved test accuracy. Overall, our work provides a new theoretical and empirical explanation of why skip connections enhance generalization, highlighting correlation reduction as a key mechanism behind their success.
Primary Area: learning theory
Submission Number: 21734
Loading