Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

Bohan Wang; Huishuai Zhang; Jieyu Zhang; Qi Meng; Wei Chen; Tie-Yan Liu

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Information-theoretical Bound, Optimal Noise Covariance, SGLD

TL;DR: We optimize the information-theoretical bound of SGLD, and obtain optimal noise covariance similar to that of SGD.

Abstract: Recently, the information-theoretical framework has been proven to be able to obtain non-vacuous generalization bounds for large models trained by Stochastic Gradient Langevin Dynamics (SGLD) with isotropic noise. In this paper, we optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized. This validates that the optimal noise is quite close to the empirical gradient covariance. Technically, we develop a new information-theoretical bound that enables such an optimization analysis. We then apply matrix analysis to derive the form of optimal noise covariance. Presented constraint and results are validated by the empirical observations.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Code: zip

34 Replies

Loading