Reinforcement Learning for Adaptive MCMC
TL;DR: Adaptive MCMC can be rigorously formulated as a reinforcement learning (RL) task, unlocking the potential of modern RL for the Bayesian posterior sampling task.
Abstract: An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called *Reinforcement Learning Metropolis—Hastings*, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis—Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx$90% of tasks in the *PosteriorDB* benchmark.
Submission Number: 230
Loading