The Clustering Paradox in Cross-Lingual Risk Expression: Distributional Universality and Temporal Necessity
Keywords: mental health risk detection, cross-lingual NLP, multilingual counseling, Korean dialogue processing, temporal analysis, survival analysis, minimum safety window, distributional universality, multi-turn conversation
Abstract: Mental health chatbots increasingly serve users across
languages, yet most risk-detection systems rely on English
single-turn social media and lack temporal grounding.
Multi-turn counseling data remain scarce due to privacy
and ethical barriers. We address this gap using Korean
professional counseling transcripts (N=2,833 dialogues)
with turn-level risk annotations and English mental-health
posts (N=3,512) for cross-lingual comparison. Using
multilingual embeddings and optimal-transport distance,
we find strong distributional universality between Korean
and English risk expressions ($W < 0.02$) but absent
categorical structure (ARI $\approx$ 0), indicating risk
forms a continuous semantic spectrum. Supervised single-turn
classification (F1=0.77) requires temporal aggregation for
safety-critical deployment. Survival analysis of Korean
dialogues establishes Minimum Safety Windows (MSW):
MSW$_{0.5}$ = 24 turns and MSW$_{0.9}$ = 64 turns, with
depression emerging faster yet converging at high-confidence
thresholds. Temporal risk-accumulation patterns generalize
across languages despite data-structure asymmetry.
Our contributions are: (1) a quantitative temporal framework
with empirically grounded safety thresholds, (2) evidence
for cross-lingual semantic universality without categorical
separability, and (3) a methodological approach enabling
temporal analysis under data scarcity, offering actionable
guidance for temporally grounded risk modeling and
safety-aware analysis in multilingual mental-health text.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: human behavior analysis, emotion detection and analysis, healthcare applications, clinical NLP, NLP for social good
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: Korean, English
Submission Number: 2128
Loading