Keywords: Representation learning, Probabilistic approaches, Posterior sampling
Abstract: In this paper, we aim to enhance self-supervised learning by leveraging Bayesian techniques to capture the full posterior distribution over representations instead of relying on maximum a posteriori (MAP) estimates. Our primary objective is to demonstrate how a rich posterior distribution can improve performance, calibration, and robustness in downstream tasks. We introduce a practical Bayesian self-supervised learning method using Cyclical Stochastic Gradient Hamiltonian Monte Carlo (cSGHMC). By placing a prior over the parameters of the self-supervised model and employing cSGHMC, we approximate the high dimensional, multimodal posterior distribution over the embeddings. This exploration of the posterior distribution yields interpretable and diverse representations. By marginalising over these representations in downstream tasks, we gain significant improvements in predictive performance, calibration and out-of-distribution detection. We validate our method across various datasets, demonstrating the practical benefits of capturing the full posterior in Bayesian self-supervised learning.
Submission Number: 9
Loading