Inverse GFlowNets for Generative Imitation Learning

20 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: generative models, imitation learning, gflownets, reinforcement learning
Abstract: Sequential generative models are typically trained by maximizing the evidence lower bound (ELBO), which optimizes the likelihood of predicting the next observation given the current one. While ELBO-based training is simple and scalable, in sequential settings it suffers from compounding errors. In this work, we reinterpret ELBO training as an imitation learning problem for modeling data distributions. We show that prior formulations suffer from an entropy bias that is misaligned with the objectives of generative modeling. To address this issue, we leverage the GFlowNet framework to eliminate the bias and derive algorithms that can be viewed as regularized ELBO objectives. Our approach assigns positive rewards to data samples and negative rewards to policy-generated samples, corresponding to minimization of the $\chi^2$-divergence between the data distribution and the policy mixture. We further establish theoretical connections to existing imitation learning methods, providing transferable insights across domains. Empirically, our approach eliminates entropy bias and achieves improved performance on a range of generative modeling tasks by combining with previous methods.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 23960
Loading