Data Curation for Pluralistic Alignment

Published: 05 Mar 2025, Last Modified: 10 Apr 2025MLDPR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Alignment, Dataset Curation, Human Feedback Datasets, Pluralistic AI Alignment, Ethical Data Collection, Reinforcement Learning from Human Feedback (RLHF)
Abstract: Human feedback datasets are central to AI alignment, yet the current data collection methods do not necessarily capture diverse and complex human values. For example, existing alignment datasets focus broadly on “Harmfulness” and “Helpfulness,” but dataset curation should also aim to dissect these broad categories into more specific dimensions. In this paper, we introduce a pluralistic alignment dataset that (i) integrates the dimensions of “Toxicity”, “Emotional Awareness”, “Sensitivity and Openness”, “Helpfulness”, and “Stereotypical Bias,” (ii) reveals undiscovered tensions in human ratings on AI-generated content, (iii) shows how demographics and political ideologies shape human preferences in alignment datasets, and (iv) highlights issues in data collection and model fine-tuning. Through a large-scale human evaluation study (N=1,095 —U.S. & Germany—, five response ratings per participant, 5,475 per dimension, and 27,375 total ratings), we identify key challenges in data curation for pluralistic alignment, including the coexistence of conflicting values in human ratings, demographic imbalances, and limitations in reward models and cost functions that prohibit them from dealing with the diversity of values in the datasets. Based on these findings, we develop a series of considerations that researchers and practitioners should consider to achieve inclusive AI models. By analyzing how human feedback varies across social groups and value dimensions, we shed light on the role of data curation in achieving bidirectional human-AI alignment—where AI systems are shaped by diverse human input and, in turn, surface the complexity and plurality of human values.
Submission Number: 14
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview