We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation

We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation

ACL ARR 2024 April Submission737 Authors

16 Apr 2024 (modified: 01 Jun 2024)ACL ARR 2024 April SubmissionEveryone, Ethics Reviewers, Ethics ChairsRevisionsBibTeXCC BY 4.0

Abstract: The detection of depression through non-verbal cues has gained significant attention. Previous research predominantly centred on identifying depression within the confines of controlled laboratory environments, often with the supervision of psychologists or counsellors. Unfortunately, datasets generated in such controlled settings may struggle to account for individual behaviours in real-life situations. In response to this limitation, we present the Extended D-vlog dataset, encompassing a collection of $1,261$ YouTube vlogs. Additionally, the emergence of large language models (LLMs) like GPT3.5, and GPT4 has sparked interest in their potential they can act like mental health professionals. Yet, the readiness of these LLM models to be used in real-life settings is still a concern as they can give wrong responses that can harm the users. We introduce a virtual agent serving as an initial contact for mental health patients, offering Cognitive Behavioral Therapy (CBT)-based responses. It comprises two core functions: 1. Identifying depression in individuals, and 2. Delivering CBT-based therapeutic responses. Our Mistral model achieved impressive scores of $\textbf{70.1\%}$ and $\textbf{30.9\%}$ for distortion assessment and classification, along with a Bert score of $\textbf{88.7\%}$. Moreover, utilizing the TVLT model on our Multimodal Extended D-vlog Dataset yielded outstanding results, with an impressive F1-score of $\textbf{67.8}$\%.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, multimodality

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 737

Loading