Cooperative Multimodal Energy-based Model with MCMC Revision

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Energy-based Model, Cooperative Learning, Multimodality
Abstract: This paper studies the learning problem of the energy-based models (EBM) for multimodal data. Learning EBMs via maximum likelihood estimation (MLE) typically involves Markov Chain Monte Carlo (MCMC) sampling, such as Langevin dynamics; however, noise-initialized Langevin dynamics is often ineffective and hard to mix. More critically, multimodal data contains complex inter-modal dependencies (i.e., relationships shared across modalities), making informative and coherent initializations across multimodalities particularly crucial for multimodal EBM sampling and learning. Notably, Multimodal VAEs, consisting of a shared generator model and a joint inference model, have made progress in capturing such inter-modal dependencies. But, both the shared generator and joint inference models are modelled as unimodal Gaussian (or Laplace), which can be limited in statistical expressivity for complex data and generator posterior distributions. In this work, we investigate the learning problem of the multimodal EBM, shared generator, and joint inference model by interweaving their MLE updates with respective MCMC revisions. With MCMC EBM revision, the shared generator learns to produce coherent multimodal initializations for EBM sampling. The joint inference model provides informative latent initializations as guided by MCMC posterior sampling. Both models serve as complementary initializer models that facilitate effective EBM sampling and learning, leading to realistic and coherent multimodal EBM samples. Extensive experiments demonstrate superior performance for multimodal synthesis quality and coherence compared to various baselines. Analysis, ablation studies, and supplementary experiments further validate the effectiveness and scalability of the proposed multimodal framework.
Primary Area: generative models
Submission Number: 7939
Loading