Slicing Mutual Information Generalization Bounds for Neural Networks

Kimia Nadjahi; Kristjan Greenewald; Rickard Brüel Gabrielsson; Justin Solomon

Slicing Mutual Information Generalization Bounds for Neural Networks

Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: generalization bounds, input-output mutual information, rate-distortion bounds

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: The ability of machine learning (ML) algorithms to generalize to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e. the MI between the training data and the learned hypothesis. These bounds have limited empirical use for modern ML applications (e.g., deep learning) since the evaluation of MI is difficult in high-dimensional settings. Motivated by recent reports of significant low-loss compressibility of neural networks, we study the generalization capacity of algorithms that slice the parameter space, i.e. train on a random lower-dimensional subspace. We derive information-theoretic bounds on generalization error in this regime and discuss an intriguing connection to the $k$-Sliced Mutual Information, an alternative measure of statistical dependence that scales well with dimension. We also propose a rate-distortion framework that allows generalization bounds to be obtained if the weights are simply close to the random subspace, and we propose a training procedure that exploits this flexibility. The computational and statistical benefits of our approach allow us to empirically estimate the input-output information of these neural networks and compute their information-theoretic generalization bounds, a task which was previously out of reach.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5915

Loading