Unveiling Depression on Social Media: Active Learning with Human-in-the-Loop Labeling for Mental Health Data Annotation and Analysis

Published: 01 Jan 2024, Last Modified: 19 May 2025NLDB (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Progress in mental health research remains constrained by the accessibility of adequate, high-quality data. Annotating mental health data requires a lot of resources and expert monitoring. In this study, we explore the utility of active learning with Human-in-the-Loop labeling approach to reduce the annotation task for identifying signs of depression in people’s social media posts. The data for this study was collected from the #WorldMentalHealthDay trend on Twitter, which is a popular mental health campaign to raise awareness of mental illness across the global population. From the pool of unlabeled data, we initially labeled a small portion of data to train an LSTM model with GloVe embedding; thereafter, the entire pool was labeled using uncertainty sampling, labeling the least confident data in each cycle. Along with the methodology, we present a high-quality dataset of 3659 samples, with a notable proportion of 22% of tweets indicating symptoms of depression. We also analyze the language usage of depressed and non-depressed individuals on social media by dissecting the semantic structure of tweets. The quality of the dataset was validated by establishing strong baseline results with state-of-the-art models and word-embedding techniques.
Loading