Non-reversible Parallel Tempering for Uncertainty Approximation in Deep LearningDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: replica exchange, parallel tempering, non-reversibility, stochastic approximation, round trip rate, deep learning
Abstract: Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficient many $P$ chains. However, such an innovation largely disappears given limited chains in big data problems due to the extremely few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote the non-reversibility and obtain an optimal communication cost $O(P\log P)$. In addition, we also analyze the bias when we adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct large-scale uncertainty approximation tasks without much tuning costs.
One-sentence Summary: A user-friendly parallel tempering algorithm that tracks the non-reversibility property with an optimal round trip time in deep learning.
Supplementary Material: zip
26 Replies

Loading