Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

TMLR Paper5372 Authors

13 Jul 2025 (modified: 23 Oct 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images, from which we extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We mark all our changes in red for convenience of reviewers.

Assigned Action Editor: ~Changyou_Chen1

Submission Number: 5372

Loading