Personalized Reward Modelling from Implicit User Preferences

Vivek Iyer; Mozhdeh Gheini; Christian Federmann; Liling Tan; Alexandra Birch

Personalized Reward Modelling from Implicit User Preferences

Vivek Iyer, Mozhdeh Gheini, Christian Federmann, Liling Tan, Alexandra Birch

17 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: personalization, reward models, LLM-as-a-judge

TL;DR: We explore the training of personalized reward models conditioned on implicit preferences expressed in long-context usage data, and contribute evaluation benchmarks, synthetic training data and reward models.

Abstract: Reward models are widely used as a proxy for human preferences during the alignment of Large Language Models (LLMs). However, preferences are subjective and vary widely across users, motivating increased research on LLM personalization. Existing work on reward modeling for personalized generation remains limited, typically requiring *explicit*, pre-defined preferences and focusing mainly on *English responses*. Addressing these gaps, we establish benchmarks for multilingual Personalized Reward Models (PRMs) to identify user-preferred responses from unstructured user data containing *implicit* preferences. We introduce a novel framework for creating synthetic personalized reward modelling data at scale, and then evaluate PRMs on three multilingual text generation tasks. Our results show that small, fine-tuned open-source PRMs can achieve comparable or better performance than LLM-as-a-judge baselines. Even state-of-the-art proprietary reasoning LLMs achieve only 72% binary classification accuracy on our dataset, highlighting the complexity of our task. We conclude with experiments on PRM-Bench, a human-annotated user-preference benchmark, validating our models and synthetic data generation pipelines.

Primary Area: datasets and benchmarks

Submission Number: 9778

Loading