PIPE: Personalized Image-generation via Preference Encoding

Published: 02 Jun 2026, Last Modified: 09 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Personalization, Text-to-Image, Representation Learning
Abstract: While modern text-to-image (T2I) models excel at generating high-quality images, they are typically trained to optimize with respect to generalized, population-level preferences. This homogeneous approach ignores the diverse, individual tastes and aesthetic judgments of different users. In this work, we propose a novel framework that learns fine-grained user preferences without relying on computationally expensive visual language models (VLMs) or prompt-sensitive text profiles. Instead, we introduce a robust, continuous user representation that models a user’s reward function as a linear combination of K base user types. We learn user-specific weights $\lambda_u$ via logistic regression on pairwise preference data to construct a continuous user embedding. This embedding is integrated into the diffusion process via an IP-Adapter, and fine-tuned using Diffusion-DPO. Our approach consistently generates images aligned with individual reward functions, achieving a 66.2% win rate against a pre-trained SDXL baseline and a 63.2% win rate against the state-of-the-art PPD framework.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 55
Loading