TL;DR: We explore the challenges of human and AI feedback for AI alignment and propose future directions to improve feedback collection, cleaning, and verification
Abstract: As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss functions but often underestimate the crucial role of data. This paper advocates for a shift towards data-centric AI alignment, emphasizing the need to enhance the quality and representativeness of data used in aligning AI systems. In this position paper, we highlight key challenges associated with both human-based and AI-based feedback within the data-centric alignment framework. Through qualitative analysis, we identify multiple sources of unreliability in human feedback, as well as problems related to temporal drift, context dependence, and AI-based feedback failing to capture human values due to inherent model limitations. We propose future research directions, including improved feedback collection practices, robust data-cleaning methodologies, and rigorous feedback verification processes. We call for future research into these critical directions to ensure, addressing gaps that persist in understanding and improving
data-centric alignment practices.
Lay Summary: AI systems are playing an increasingly important role in our daily lives, from recommending what we watch to helping with medical decisions. But how do we make sure these systems truly reflect what people care about—our values, goals, and preferences?
Most efforts to align AI with human values focus on how the algorithms are built. Our work argues that we’re missing a big part of the picture: the data used to train and guide these systems. If the feedback data—whether it comes from people or from other AI—is flawed, the AI may learn the wrong lessons.
We explore how feedback can be inconsistent, biased, or unclear, and how these feedback can be outdated over time or miss important human perspectives. We call for better ways to collect, clean, and verify this feedback to make sure it truly represents what people want.
Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)
No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.
Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.
Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.
Paper Verification Code: NzY0Z
Permissions Form: pdf
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: AI alignment, reliability, feedback collection
Submission Number: 39
Loading