Benchmarking Suicide-Related Language: A Study on Interpretability and Explainability

Benchmarking Suicide-Related Language: A Study on Interpretability and Explainability

ACL ARR 2025 February Submission4154 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Individuals experiencing mental health challenges often share their thoughts and emotions on social media rather than seeking professional support, making it crucial to accurately distinguish genuine suicidal ideation from mentions of suicide in humor or figurative language. However, automated suicide detection faces challenges such as data scarcity, ambiguity in suicidal expressions, and inconsistencies between human and model predictions. To address these issues, we (i) expand an existing dataset via expert annotation, enhancing data diversity and representation, and (ii) benchmark the performance of 8 state-of-the-art language models (LMs), including both general-purpose and domain-specific models, and (iii) conduct a category-wise performance analysis to evaluate their effectiveness in detecting suicide-related content. Our findings demonstrate that domain-specific models, particularly MentalRoBERTa and MentalBERT, outperform general-purpose models, especially as dataset size increases. To gain further insight into LM behaviour we perform interpretability and explainability analyses, examining token importance scores to identify misclassification patterns. Results indicate that models over-rely on emotionally charged keywords, often misclassifying humor, figurative language, and personal distress expressions. Additionally, we conduct N-gram analysis across content categories, revealing significant linguistic overlap, which pose challenges for precise classification.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: benchmarking,NLP datasets, feature attribution, free-text/natural language explanations, evaluation methodologies,healthcare applications

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English, human language

Submission Number: 4154

Loading