Pyramidal-Graded Response for Large Lanague Model on Youth SafetyDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large Language Models (LLMs) have made significant strides in man-machine interactions. However, this advancement brings the issue of dialogue safety into sharp focus. Current research on the alignment and safety of LLMs predominantly targets adult audiences, overlooking the distinct cognitive stages of human development, particularly in youth. Recognizing this gap, we build a pyramidal youth safety benchmark (PYSafety), the largest labeled to date, comprising 275,321 records of data.Based on the benchmark, we introduce a pyramidal-graded response (PGR) strategy designed to tailor safety responses, ensuring that each interaction is aligned with the specific safety needs of the user demographic.In the implementation of the PGR strategy, we propose Safety Preference Optimization (SPO), a novel approach designed to enhance the safety performance of LLMs without additional training.The evaluation of 10 leading LLMs on the PYSafety benchmark revealed that they fall short of the desired standards for youth safety.Our SPO-based PGR strategy demonstrated a significant safety enhancement in performance across a majority of LLMs, achieving an average 20\% to 30\% increase in the win rate compared to their original responses. This work offers a systematic approach to analyzing and enhancing LLM performance on youth safety.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: Chinese, English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview