OSPA: Enhancing Identity-Preserving Image Generation via Online Self-Preference Alignment

Xusen Ma; Xiaoqin Wang; Xianxu Hou; Meidan Ding; Zhe Kong; Junliang Chen; Linlin Shen

OSPA: Enhancing Identity-Preserving Image Generation via Online Self-Preference Alignment

Xusen Ma, Xiaoqin Wang, Xianxu Hou, Meidan Ding, Zhe Kong, Junliang Chen, Linlin Shen

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Model, Identity-Preserving Generation, Preference Optimization, Online Learning

Abstract: Identity-preserving text-to-image generation has recently received increasing attention, yet it remains a challenging task. Existing approaches typically fine-tune diffusion models, but they often fail to preserve identity information reliably. Reinforcement learning with human feedback (RLHF) can improve identity consistency, but it requires expensive reward models and carefully curated annotations, limiting its practicality. We present Online Self-Preference Alignment (OSPA), a plug-and-play framework that achieves identity-preserving generation without relying on external reward models or high-quality datasets. OSPA exploits self-preference signals through three components: (1) a self-preference sample generation module that perturbs a frozen policy model to produce paired samples with explicit preferences; (2) a self-reward preference optimization mechanism that updates the policy using group preference optimization; and (3) an online curriculum learning strategy that continuously refines the sample generator with feedback from the evolving policy model. Comprehensive experiments on four state-of-the-art identity-preserving text-to-image models demonstrate that OSPA substantially improves identity fidelity while maintaining visual quality, offering a general and effective alignment strategy for generative models.

Primary Area: generative models

Submission Number: 10616

Loading