Newtonian Monte Carlo: a second-order gradient method for speeding up MCMCDownload PDF

Nimar S. Arora, Nazanin Khosravani Tehrani, Kinjal Divesh Shah, Michael Tingley, Yucen Lily Li, Narjes Torabi, David Noursi, Sepehr Akhavan Masouleh, Eric Lippert, Erik Meijer

16 Oct 2019 (modified: 05 May 2023)AABI 2019Readers: Everyone
Keywords: MCMC, HMC, NUTS, MALA, Bayesian Inference, Newtonian Methods
TL;DR: Exploit curvature to make MCMC methods converge faster than state of the art.
Abstract: We present Newtonian Monte Carlo (NMC), a method to improve Markov Chain Monte Carlo (MCMC) convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is not to find a maxima but instead to find a parameterized density that can best match the local curvature of the target density. This parameterized density is then used as a single-site Metropolis-Hastings proposal. As a further improvement on first order methods, we show that random variables with constrained supports don't need to be transformed before taking a gradient step. NMC directly matches constrained random variables to a proposal density with the same support thus keeping the curvature of the target density intact. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.
0 Replies

Loading