Efficient Generative Models Personalization via Optimal Experimental Design

Efficient Generative Models Personalization via Optimal Experimental Design

ICLR 2026 Conference Submission18105 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, personalization, experimental design, alignment, rlhf, generative models, preference-based rl

TL;DR: ED-PBRL casts human preference query selection as optimal experimental design for general generative models, giving a convex-relaxed, theory-backed algorithm that, in text-to-image personalization tests, cuts queries below random exploration

Abstract: Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, from which we can elucidate the latent reward function modeling user preferences efficiently. We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model. We show that this problem has a convex optimization relaxation, and introduce a statistically and computationally efficient algorithm (ED-PBRL) that is supported by theoretical guarantees and can efficiently construct structured queries such as images or text. We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.

Primary Area: reinforcement learning

Submission Number: 18105

Loading