Abstract: Individuals experiencing mental health challenges often share their thoughts and emotions on social media rather than seeking professional support, making it crucial to accurately distinguish genuine suicidal ideation from mentions of suicide in humor or figurative language. However, automated suicide detection faces challenges such as data scarcity, ambiguity in suicidal expressions, and inconsistencies between human and model predictions. To address these issues, we (i) expand an existing dataset via expert annotation, enhancing data diversity and representation, and (ii) benchmark the performance of 8 state-of-the-art language models (LMs), including both general-purpose and domain-specific models, and (iii) conduct a category-wise performance analysis to evaluate their effectiveness in detecting suicide-related content. Our findings demonstrate that domain-specific models, particularly MentalRoBERTa and MentalBERT, outperform general-purpose models, especially as dataset size increases. To gain further insight into LM behaviour we perform interpretability and explainability analyses, examining token importance scores to identify misclassification patterns. Results indicate that models over-rely on emotionally charged keywords, often misclassifying humor, figurative language, and personal distress expressions. Additionally, we conduct N-gram analysis across content categories, revealing significant linguistic overlap, which pose challenges for precise classification.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: benchmarking,NLP datasets, feature attribution, free-text/natural language explanations, evaluation methodologies,healthcare applications
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, human language
Submission Number: 4154
Loading