Beyond KL-Regularization: Achieving Unbiased Direct Alignment through Diffusion $f_{\chi^n}$-Preference Optimization
TL;DR: Diffusion-$\chi^n$PO employs $f_{\chi^n}$-regularization for robust text-to-image alignment, surpassing state-of-the-art performance while enhancing uncertainty quantification.
Abstract: Recently, aligning diffusion models with human preferences has emerged as a key focus in text-to-image generation research.
Current state-of-the-art alignment approaches predominantly rely on reverse Kullback–Leibler (KL) divergence regularization, a strategy that both restricts the potential utilization of existing data and introduces bias.
In this work, we propose Diffusion-$\chi^n$PO, a novel method that refines the gradient ratio of the objective function via $f_{\chi^n}$-regularization, thereby balancing optimization power between human-preferred and non-preferred samples.
Specifically, we integrate the likelihood concept of diffusion models into $\chi^2$-Preference Optimization ($\chi$PO) and re-express it as a fully differentiable objective function.
Building on this foundation, we generalize to the $f_{\chi^n}$-Preference Optimization ($\chi^n$PO) framework, which substantially improves the flexibility of implicit reward model design and alleviates the influence of non-preferred samples in conflicting data.
Furthermore, we provide a thorough analysis of the impacts of $\chi^2 + \mathrm{KL}$-regularization, $f_{\chi^n}$-regularization, and KL-regularization on the alignment process from the perspective of gradient fields.
Finally, we fine-tune the Stable Diffusion v1.5 model on the Pick-a-Pic preference dataset using Diffusion-$\chi^n$PO.
Experimental results demonstrate enhanced alignment with textual prompts and improved visual quality, confirming the effectiveness of our proposed framework.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Machine Learning, Text-to-Image Diffusion Model, Preference Alignment, $f_{\chi^n}$ Preference Optimization ($\chi^n$PO)
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Submission Number: 1704
Loading