UPER: Bridging the Perception Gap in Personalized Image Generation with Human-Aligned Reinforcement Learning

UPER: Bridging the Perception Gap in Personalized Image Generation with Human-Aligned Reinforcement Learning

ICLR 2026 Conference Submission25229 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RLHF, Personalization

Abstract: Personalized image generation aims to synthesize novel scenes featuring a specific user-provided subject. However, state-of-the-art models often fail to preserve the fine-grained details that define a subject's unique identity, a critical flaw that limits their use in high-fidelity applications. This "consistency gap" arises from a misalignment between the model's learned similarity metric and nuanced human perception. To address this, we introduce \textbf{UPER} (\textbf{U}nifying \textbf{P}ost-training for P\textbf{er}sonalization), a post-training framework designed to align generative models with human preferences for detail consistency. UPER employs a two-stage process: it first refines the model's focus on the subject's core attributes via Supervised Fine-Tuning (SFT) on a dataset with cleaned background information. Subsequently, it optimizes the model using Reinforcement Learning (RL) with a novel composite reward function. The key component of this function is a new patch-based consistency metric that accurately measures subject fidelity using only pre-trained vision encoders, eliminating the need for expensive preference data collection. We apply UPER to the state-of-the-art OminiControl model. The results are unequivocal: in a blind user study with over 1,000 responses, images generated by our final model were preferred for their overall quality and subject consistency \textbf{89.3\%} of the time over the strong baseline. Our work provides a robust and scalable solution to the detail-consistency challenge, paving the way for more faithful personalized generation.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 25229

Loading