Efficient Generative Models Personalization via Optimal Experimental Design

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: preference based reinforcement learning, text-to-image, experiment design, personalization, alignment, generative models
TL;DR: A personalization method for generative models via optimal experimental design for preference based RL
Abstract: Preference learning from human feedback has been widely adopted to align generative models with end-users. However, human feedback is costly and time-consuming to obtain, creating demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, which can efficiently elucidate the latent reward function modeling user preferences. To this end, we formulate the problem of preference query selection as a planning problem aimed to maximize the information that queries provide about the user’s underlying latent reward model. We show that this problem has a convex optimization formulation, and introduce ED-PBRL, a statistically and computationally efficient algorithm that is supported by theoretical guarantees. We empirically showcase the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires substantially fewer preference queries compared to random query selection.
Submission Number: 64
Loading