Keywords: Visual Generative Models, Diffusion Models, Reinforcement Learning, Distribution-wise Rewards
TL;DR: We propose an efficient method to train visual generative models using reinforcement learning with distribution-wise rewards (instead of sample-wise), which reduces artifacts, improves diversity, and achieves better FID scores.
Abstract: Reinforcement learning (RL) for visual generative models often relies on sample-wise reward functions, which can incentivize reward hacking, leading to visual artifacts and reduced diversity. In this work, we propose a novel approach that utilizes distribution-wise rewards to guide visual generative models in learning the real-world image distribution more accurately. Unlike rewards that evaluate samples individually, distribution-wise reward accounts for the data distribution of the samples, mitigating the mode collapse problem that occurs when all samples optimize towards the same direction independently. To overcome the prohibitive computational cost of estimating these rewards, we introduce a subset-replace strategy that efficiently provides reward signals by updating only a small subset of a generated reference set. Additionally, we apply RL to optimize post-hoc model merging coefficients, potentially mitigating the train-inference inconsistency caused by introducing stochastic differential equation (SDE) in regular RL practices. Extensive experiments show our approach significantly improves FID-50K across various base models, from 8.30 to 5.77 for SiT and from 3.74 to 3.52 for EDM2. Qualitative evaluation also confirms that our method enhances perceptual quality while preserving sample diversity.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 2110
Loading