Decoding Safety Feedback from Diverse Raters: A Data-driven Lens on Responsiveness to Severity

TMLR Paper5984 Authors

24 Sept 2025 (modified: 03 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Ensuring the safety of Generative AI requires a nuanced understanding of pluralistic viewpoints. In this paper, we introduce a novel data-driven approach for analyzing ordinal safety ratings in pluralistic settings. Specifically, we address the challenge of interpreting nuanced differences in safety feedback from a diverse population expressed via ordinal scales (e.g., a Likert scale). We define non-parametric responsiveness metrics that quantify how raters convey broader distinctions and granular variations in the severity of safety violations. Leveraging publicly available datasets of pluralistic safety feedback as our case studies, we investigate how raters from different demographic groups use an ordinal scale to express their perceptions of the severity of violations. We apply our metrics across violation types, demonstrating their utility in extracting nuanced insights that are crucial for aligning AI systems reliably in multi-cultural contexts. We show that our approach can inform rater selection and feedback interpretation by capturing nuanced viewpoints across different demographic groups, hence improving the quality of pluralistic data collection and in turn contributing to more robust AI alignment.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Greg_Durrett1
Submission Number: 5984
Loading