PERSONA: A Reproducible Testbed for Pluralistic Alignment

PERSONA: A Reproducible Testbed for Pluralistic Alignment

ACL ARR 2024 June Submission3949 Authors

16 Jun 2024 (modified: 18 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid advancement and adoption of language models (LMs) has highlighted critical challenges in aligning these models with the diverse values and preferences of global users. Existing reinforcement learning from human feedback (RLHF) approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives. To address this, we introduce PERSONA, a comprehensive and reproducible test bed designed to evaluate and improve pluralistic alignment in language models. Our approach utilizes synthetic personas, crafted through a combination of US census data and procedural generation, to simulate a wide array of user profiles with diverse demographic and idiosyncratic attributes. We present a detailed methodology for constructing a representative demographic of 1,586 personas, each enriched with individualistic personality traits and core values. Leveraging this synthetic demographic, we generate a large-scale preference dataset containing 3,868 prompts and 317,200 pairs of diverse feedback. This dataset enables the evaluation of language models' ability to align with both group-level and individual preferences across various controversial and value-laden topics. Our contributions include a systematic evaluation of current LM capabilities in role-playing diverse users, verified through human judges, and the establishment of a benchmark for pluralistic alignment approaches. Our work aims to facilitate the development of more inclusive and representative language models, paving the way for future research in global pluralistic alignment. The full dataset is available here \href{https://sites.google.com/view/pluralistic}{https://sites.google.com/view/pluralistic}

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Alignment, Personalization, LLMs

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 3949

Loading