Diffusion-RainbowPA: Improvements Integrated Preference Alignment for Diffusion-based Text-to-Image Generation

Published: 22 Jul 2025, Last Modified: 22 Jul 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although rapidly increasing capabilities of text-to-image (T2I) models have profound implications across various industries, they concurrently suffer from numerous shortcomings, necessitating the implementation of effective alignment strategies with human preference. Diffusion-DPO and SPO have emerged as robust approaches for aligning diffusion-based T2I models with human preference feedback. However, they tend to suffer from text-image misalignment, aesthetic overfitting and low-quality generation. To tackle such matters, we improve the alignment paradigm through a tripartite perspective, which are the calibration enhancement (Calibration Enhanced Preference Alignment), the overfitting mitigation (Identical Preference Alignment, Jensen-Shannon Divergence Constraint) and the performance optimization (Margin Strengthened Preference Alignment, SFT-like Regularization). Furthermore, combining them with the step-aware preference alignment paradigm, we propose the Diffusion-RainbowPA, a suite of total six improvements that collectively improve the alignment performance of Diffusion-DPO. With comprehensive alignment performance evaluation and comparison, it is demonstrated that Diffusion-RainbowPA outperforms current state-of-the-art methods. We also conduct ablation studies on the introduced components that reveal incorporation of each has positively enhanced alignment performance.
Submission Length: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Yingnian_Wu1
Submission Number: 4755
Loading