Position: Challenges and Future Directions of Data-Centric AI Alignment

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We explore the challenges of human and AI feedback for AI alignment and propose future directions to improve feedback collection, cleaning, and verification
Abstract: As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss functions but often underestimate the crucial role of data. This paper advocates for a shift towards data-centric AI alignment, emphasizing the need to enhance the quality and representativeness of data used in aligning AI systems. In this position paper, we highlight key challenges associated with both human-based and AI-based feedback within the data-centric alignment framework. Through qualitative analysis, we identify multiple sources of unreliability in human feedback, as well as problems related to temporal drift, context dependence, and AI-based feedback failing to capture human values due to inherent model limitations. We propose future research directions, including improved feedback collection practices, robust data-cleaning methodologies, and rigorous feedback verification processes. We call for future research into these critical directions to ensure, addressing gaps that persist in understanding and improving data-centric alignment practices.
Lay Summary: AI systems are playing an increasingly important role in our daily lives, from recommending what we watch to helping with medical decisions. But how do we make sure these systems truly reflect what people care about—our values, goals, and preferences? Most efforts to align AI with human values focus on how the algorithms are built. Our work argues that we’re missing a big part of the picture: the data used to train and guide these systems. If the feedback data—whether it comes from people or from other AI—is flawed, the AI may learn the wrong lessons. We explore how feedback can be inconsistent, biased, or unclear, and how these feedback can be outdated over time or miss important human perspectives. We call for better ways to collect, clean, and verify this feedback to make sure it truly represents what people want.
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: AI alignment, reliability, feedback collection
Submission Number: 39
Loading