Keywords: ebm, generative modelling, parallel tempering, sampling, clustered distribution
Abstract: We introduce a novel training protocol for energy-based models that accelerates the equilibration of Markov chains used in maximum-likelihood training, enabling stable and accurate learning on highly clustered, multimodal datasets. The method extends Trajectory Parallel Tempering, inspired by parallel tempering and Hamiltonian exchange Monte Carlo, by dynamically exchanging model parameters with earlier stages, faster-mixing stages to enhance exploration. A reservoir-based strategy reuses equilibrium samples from previous models, reducing memory costs and achieving speeds comparable to Persistent Contrastive Divergence when combined with optimized gradient schedulers such as Nesterov Accelerated Gradient. Experiments on clustered datasets show consistently higher test log-likelihoods and markedly improved sample quality in Restricted Boltzmann Machines compared to standard methods.
Submission Number: 41
Loading