Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

ACL ARR 2026 January Submission4370 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM-as-Judge, mental health dialogue, LLM safety alignment

Abstract: Evaluating the safety alignment of LLM responses in high-risk mental health dialogues is challenging because gold-standard answers rarely exist in open-ended counseling scenarios, and evaluation results must be interpretable given the ethically sensitive nature. To address this gap, we present PsyCrisis, the first evaluation framework that enables both reference-free assessment and interpretable outcomes for high-risk mental health dialogues. To enable evaluation without reference answers, we adopt an LLM-as-Judge approach that conducts in-context evaluation using expert-defined reasoning chains grounded in psychological intervention principles. To ensure interpretability, we design expert chain-of-thought reasoning and apply binary point-wise scoring across multiple safety dimensions, making each judgment traceable. We also present a manually curated Chinese dataset covering self-harm, suicidal ideation, and existential distress from real-world online discourse. Experiments on 3,600 judgments show that our method achieves the highest agreement with expert assessments and produces more interpretable evaluation rationales than existing approaches. Our dataset and code will be publicly available to facilitate further research.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: emotion detection and analysis, human-computer interaction

Languages Studied: Chinese

Submission Number: 4370

Loading