Sample-then-optimize posterior sampling for Bayesian linear models

Alexander G de G Matthews, Jiri Hron, Richard E Turner, Zoubin Ghahramani

08 Sept 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: In modern machine learning it is common to train models which have an extremely high intrinsic capacity. The results obtained are often initialization dependent, are different for disparate optimizers and in some cases have no explicit regularization. This raises difficult questions about generalization [1]. A natural approach to questions of generalization is a Bayesian one. There is therefore a growing literature attempting to understand how Bayesian posterior inference could emerge from the complexity of modern practice [2, 3], even without having such a procedure as the stated goal. In this work we consider a simple special case where exact Bayesian posterior sampling emerges from sampling (cf initialization) and then gradient descent. Specifically, for a Bayesian linear model, if we parameterize it in terms of a deterministic function of an isotropic normal prior, then the action of sampling from the prior followed by first order optimization of the squared loss will give a posterior sample. Although the assumptions are stronger than many real problems, it still exhibits the challenging properties of redundant model capacity and a lack of explicit regularizers, along with initialization and optimizer dependence. It is therefore an interesting controlled test case. Given its simplicity, the method itself may turn out to be of independent interest from our original goal. Whilst the material of Section 2 is classical [4], the material in Section 3 is, as far as we are aware, novel.

0 Replies