Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Main paper track (up to 5 pages excluding references and appendix)
Keywords: text-to-image diffusion models, preference alignment
TL;DR: We propose margin-aware preference optimization (MaPO) for aligning diffusion models without reference model which overcomes reference mismatch issue in diffusion model alignment.
Abstract: Preference alignment methods (such as DPO) typically rely on divergence regularization for stability but struggle with reference mismatch when preference data deviates from the reference model. In this paper, we identify the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models. Motivated by this analysis, we propose a reference-agnostic alignment of T2I diffusion models, coined margin-aware preference optimization (MaPO). By freeing the reference model, MaPO enables a new way to address diverse T2I downstream tasks, with varying levels of reference mismatch.. We validate this with five representative T2I tasks: (1) preference alignment, (2) cultural representation, (3) safe generation, (4) style learning, and (5) personalization. MaPO surpasses Diffusion DPO as the level of reference mismatch starts to increase while also being superior to task-specific methods like DreamBooth. Additionally, MaPO enjoys being more efficient in both training time and memory without compromising quality.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 97
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview