Non-reversible Parallel Tempering for Uncertainty Approximation in Deep Learning

Wei Deng; Qian Zhang; Qi Feng; Faming Liang; Guang Lin

Non-reversible Parallel Tempering for Uncertainty Approximation in Deep Learning

Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: replica exchange, parallel tempering, non-reversibility, stochastic approximation, round trip rate, deep learning

Abstract: Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficient many $P$ chains. However, such an innovation largely disappears given limited chains in big data problems due to the extremely few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote the non-reversibility and obtain an optimal communication cost $O(P\log P)$. In addition, we also analyze the bias when we adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct large-scale uncertainty approximation tasks without much tuning costs.

One-sentence Summary: A user-friendly parallel tempering algorithm that tracks the non-reversibility property with an optimal round trip time in deep learning.

Supplementary Material: zip

26 Replies

Loading