PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

ACL ARR 2026 January Submission8203 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: ethical considerations in NLP applications, reflections and critiques, mental health

Abstract: The increasing integration of large language models (LLMs) into mental health applications necessitates robust frameworks for evaluating professional safety alignment. Current evaluative approaches primarily rely on refusal-based safety signals, which offer limited insight into the nuanced behaviors required in clinical practice. In mental health, clinically inadequate refusals can be perceived as unempathetic and discourage help-seeking. To address this gap, we move beyond refusal-centric metrics and introduce PsychEthicsBench, the first principle-grounded benchmark based on Australian psychology and psychiatry guidelines, designed to evaluate LLMs' ethical knowledge and behavioral responses through multiple-choice and open-ended tasks with fine-grained ethicality annotations. Empirical results across 14 models reveal that refusal rates are poor indicators of ethical behavior, revealing a significant divergence between safety triggers and clinical appropriateness. Notably, we find that domain-specific fine-tuning can degrade ethical robustness, as several specialized models underperform their base backbones in ethical alignment. PsychEthicsBench provides a foundation for systematic, jurisdiction-aware evaluation of LLMs in mental health, encouraging more responsible development in this domain.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: ethical considerations in NLP applications, reflections and critiques, model bias/fairness evaluation

Contribution Types: Data resources

Languages Studied: English

Submission Number: 8203

Loading