Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

Gintare Karolina Dziugaite; Mahdi Haghifam; Jeffrey Negrea; Ashish  Khisti; Daniel M. Roy

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

Gintare Karolina Dziugaite, Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: In this work, we improve upon stepwise analysis of noisy iterative learning algo- rithms initiated by Pensia et al. (2018) and recently extended by Bu et al. (2019). Our main contribution are significantly improved mutual information bounds for SGLD, based on data-dependent estimates. Our approach is based on the varia- tional characterization of mutual information and the use of data-dependent pri- ors that serve to estimate the mini-batch gradient based on the training sample. Our approach is broadly applicable within the information-theoretic framework of Russo et al. (2015) and Xu et al. (2017). Our bound can be tied to a measure of flatness in the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller. Finally, we observe empirically that widening a network causes the terms in our bound to shrink further.

Code Link: https://github.com/jnegrea/neurips2019-5904-code

CMT Num: 5904

0 Replies

Loading