A Sociotechnical Perspective on Aligning AI with Pluralistic Human Values

Dalia Ali; Aysenur Kocak; Dora Zhao; Allison Koenecke; Orestis Papakyriakopoulos

A Sociotechnical Perspective on Aligning AI with Pluralistic Human Values

Dalia Ali, Aysenur Kocak, Dora Zhao, Allison Koenecke, Orestis Papakyriakopoulos

Published: 06 Mar 2025, Last Modified: 05 May 2025ICLR 2025 Bi-Align Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bidirectional AI Alignment, Dataset Curation, Human Feedback Datasets, Pluralistic AI Alignment, Challenges in AI Alignment, Reinforcement Learning from Human Feedback (RLHF), Preference-Based Fine-Tuning

Abstract: Human feedback datasets are central to AI alignment, yet the current data collection methods do not necessarily capture diverse and complex human values. For example, existing alignment datasets focus broadly on “Harmfulness” and “Helpfulness,” but dataset curation should also aim to dissect these broad categories into more specific dimensions. In this paper, we introduce a pluralistic alignment dataset that (i) integrates the dimensions of “Toxicity”, “Emotional Awareness”, “Sensitivity and Openness”, “Helpfulness”, and “Stereotypical Bias,” (ii) reveals undiscovered tensions in human ratings on AI-generated content, (iii) shows how demographics and political ideologies shape human preferences in alignment datasets, and (iv) highlights issues in data collection and model fine-tuning. Through a large-scale human evaluation study (N=1,095 —U.S. & Germany—, five response ratings per participant, 5,475 per dimension, and 27,375 total ratings), we identify key challenges in data curation for pluralistic alignment, including the coexistence of conflicting values in human ratings, demographic imbalances, and limitations in reward models and cost functions that prohibit them from dealing with the diversity of values in the datasets. Based on these findings, we develop a series of considerations that researchers and practitioners should consider to achieve inclusive AI models. By analyzing how human feedback varies across social groups and values, we contribute to the ongoing discussion of bidirectional human-AI alignment, where AI systems are shaped by human input and, in turn, reveal the diversity of human values.

Submission Type: Long Paper (9 Pages)

Archival Option: This is a non-archival submission

Presentation Venue Preference: ICLR 2025

Submission Number: 65

Loading