Keywords: generative model, energy-based model, EBM, diffusion model, hybrid generative model, mcmc, langevin
TL;DR: We introduce a hybridization of EBMs and diffusion models that can efficiently generate realistic and highly diverse unconditional samples.
Abstract: This work introduces the Hat Diffusion Energy-Based Model (HDEBM), a hybrid of EBMs and diffusion models that can perform high-quality unconditional generation for multimodal image distributions. Our method is motivated by the observation that a partial forward and reverse diffusion defines an MCMC process whose steady-state is the data distribution when the diffusion is perfectly trained. The components of HDEBM are a generator network that proposes initial model samples, a truncated diffusion model that adds and removes noise to generator samples as an approximate MCMC step that pushes towards realistic images, and an energy network that further refines diffusion outputs with Langevin MCMC. All networks are incorporated into a single unnormalized density. MCMC with the energy network is crucial for driving multimodal generation, while the truncated diffusion can generate fine details needed for high-quality images. Experiments show HDEBM is effective for unconditional generation with sampling costs significantly lower than diffusion models. We achieve an FID score of 21.82 on unconditional ImageNet at 128x128 resolution, which to our knowledge is state-of-the-art among unconditional models which do not use separate retrieval data.
Supplementary Material: pdf
Submission Number: 3914
Loading