PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

Daiwei Chen; Yi Chen; Aniket Rege; Ramya Korlakai Vinayak

PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Alignment, Preference Learning, Plurality

TL;DR: A novel alignment framework to learn from heterogeneous human preferences

Abstract: Large foundation models require extensive \textit{alignment} to human preferences before deployment. Existing methods for alignment from comparison data largely assume a universal preference, neglecting the diversity of individual opinions. We introduce PAL, a personalizable reward framework that models the \emph{plurality} of human preferences via latent variables using the ideal point model, metric learning, and mixture modeling. PAL captures the \emph{plurality} of preferences while learning a common preference latent space, enabling few-shot generalization to new users. It is modular, interpretable, and flexible in incorporating complexity via data driven cross-validation. With simple multi-layer perceptron, PAL achieves competitive reward model accuracy on Summary \cite{stiennon2020learning} (language), Pick-a-Pic \cite{kirstain2024pick} (image generation), and Persona \cite{perez2022discovering} (semi-synthetic) heterogeneous preference datasets, matching state-of-the-art performance with greater efficiency. Lastly, our findings also highlight the need for more nuanced data collection to capture the heterogeneity of human preferences.

Submission Number: 6

Loading