Modeling the Plurality of Human Preferences via Ideal Points

Published: 18 Jun 2024, Last Modified: 07 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Alignment, Preference Learning, Plurality
TL;DR: A novel alignment framework to learn from heterogeneous human preferences
Abstract: Large foundation models require extensive \textit{alignment} to human preferences before deployment. Existing methods utilize the Bradley-Terry-Luce (BTL) model \cite{bradley1952rank} and often assume a universal preference, neglecting the diversity of individual opinions. We introduce \PAL, a framework that models the plurality of human preferences using the ideal point model and mixture modeling. \PAL captures the plurality while learning a common preference latent space, enabling few-shot generalization to new users. With simple multi-layer perceptron, \PAL achieves competitive reward model accuracy on Summary \cite{stiennon2020learning} (language), Pick-a-Pic \cite{kirstain2024pick} (image generation), and Persona \cite{perez2022discovering} (semisynthetic) induced heterogeneous datasets, matching state-of-the-art performance with greater efficiency. Lastly, our findings highlight the need for more nuanced data collection to capture the heterogeneity of human preferences.
Submission Number: 22
Loading