Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Entropy-SG(L)D Optimizes the Prior of a (Valid) PAC-Bayes Bound
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm for classifiers, optimizes a PAC-Bayes bound on the risk of the classifier, or more accurately, the Gibbs posterior, i.e., a risk-sensitive perturbation of the classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we show that an epsilon-differentially private prior yields a valid PAC-Bayes bound, a straightforward consequence of results connecting generalization with differential privacy. Using stochastic gradient Langevin dynamics (SGLD) to approximate the well-known exponential release mechanism, we observe that generalization error on MNIST (measured on held out data) falls within the (empirically nonvacuous) bounds computed under the assumption that SGLD produces perfect samples. In particular, Entropy-SGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain state-of-the-art performance.
TL;DR:We show that Entropy-SGD optimizes the prior of a PAC-Bayes bound, violating the requirement that the prior be independent of data; we use differential privacy to resolve this and improve generalization.
Keywords:generalization error, neural networks, statistical learning theory, PAC-Bayes theory
Enter your feedback below and we'll get back to you as soon as possible.