Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Bayesian Inference, Amortization, Variational Inference, Transformers, Permutation Invariance
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network.
Abstract: Bayesian inference is a natural approach to reasoning about uncertainty. Unfortunately, in practice it generally requires expensive iterative methods like MCMC to approximate posterior distributions. Not only are these methods computationally expensive, they must be re-run when new observations are available, making them impractical or of limited use in many contexts. In this work, we amortize the posterior parameter inference for probabilistic models by leveraging permutation invariant, set-based network architectures which respect the inherent exchangeability of independent observations of a dataset. Such networks take a set of observations explicitly as input to predict the posterior with a single forward pass and allow the model to generalize to datasets of different cardinality and different orderings. Our experiments explore the effectiveness of this approach for both posterior estimation directly as well as model predictive performance. They show that our approach is comparable to dataset-specific procedures like Maximum Likelihood estimation and MCMC on a range of probabilistic models. Our proposed approach uses a reverse KL-based training objective which does not require the availability of ground truth parameter values during training. This allows us to train the amortization networks more generally. We compare this approach to existing forward KL-based training methods and show substantially improved generalization performance. Finally, we also compare various architectural elements, including different set-based architectures (DeepSets vs Transformers) and distributional parameterizations (Gaussian vs Normalizing Flows).
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8217
Loading