PIPE: Personalized Image-generation via Preference Encoding

Moonkyung Ryu; Chih-Wei Hsu; Avinab Saha; Ofir Nabati; Guy Tennenholtz; Junfeng He; Craig Boutilier

PIPE: Personalized Image-generation via Preference Encoding

Moonkyung Ryu, Chih-Wei Hsu, Avinab Saha, Ofir Nabati, Guy Tennenholtz, Junfeng He, Craig Boutilier

Published: 02 Jun 2026, Last Modified: 09 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Personalization, Text-to-Image, Representation Learning

Abstract: While modern text-to-image (T2I) models excel at generating high-quality images, they are typically trained to optimize with respect to generalized, population-level preferences. This homogeneous approach ignores the diverse, individual tastes and aesthetic judgments of different users. In this work, we propose a novel framework that learns fine-grained user preferences without relying on computationally expensive visual language models (VLMs) or prompt-sensitive text profiles. Instead, we introduce a robust, continuous user representation that models a user’s reward function as a linear combination of K base user types. We learn user-specific weights $\lambda_u$ via logistic regression on pairwise preference data to construct a continuous user embedding. This embedding is integrated into the diffusion process via an IP-Adapter, and fine-tuned using Diffusion-DPO. Our approach consistently generates images aligned with individual reward functions, achieving a 66.2% win rate against a pre-trained SDXL baseline and a 63.2% win rate against the state-of-the-art PPD framework.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 55

Loading