Toward Deployable Pluralistic Alignment in Robotics: Learning Similarity-Grouped Rewards from Diverse Human Preferences

Taehyung Kim; Gwangmo Lee; Jonghak Bae; Dongjae Kim; Jaewoong Han; Jongeun Choi

Toward Deployable Pluralistic Alignment in Robotics: Learning Similarity-Grouped Rewards from Diverse Human Preferences

Taehyung Kim, Gwangmo Lee, Jonghak Bae, Dongjae Kim, Jaewoong Han, Jongeun Choi

Published: 02 Jun 2026, Last Modified: 03 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Preference-based reinforcement learning, Pluralistic alignment, Human-centered robotics, Deployable policy learning

TL;DR: PREC is a preference-based RL framework for pluralistic alignment in robotics that learns representative reward models by clustering users via leaky EM, enabling deployable personalization under sparse and noisy offline feedback.

Abstract: Personalization is essential for deploying robotic systems across diverse end users. However, fully individualized policies are difficult to validate, costly to scale, and unreliable under sparse and noisy preference feedback, while a single global policy collapses meaningful preference heterogeneity across users. To address these challenges, we formulate deployable pluralistic alignment as a preference-based reinforcement learning (PbRL) problem, aiming to learn a limited number of policies that serve a heterogeneous user population while preserving diverse user preferences. We develop this approach into Preference-based REward Clustering (PREC), a framework that learns a compact set of representative reward models from human preference labels (i.e., good/bad feedback) collected across users. PREC first learns a population-level trajectory representation from state-action data without relying on preference labels, reducing reliance on limited and skewed per-user coverage and mitigating exposure to label noise. It then learns group-level reward decoders shared among users with similar preferences, pooling sparse and noisy feedback to capture distinct preference modes while yielding a manageable number of representative reward models. Experiments across diverse simulated robotic locomotion environments show that PREC improves aggregate social welfare over both a single global policy and fully individualized policies under sparse and noisy feedback across diverse preference distributions.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 39

Loading