Multi-modal deep learning system for depression and anxiety detection

Brian Diep; Marija Stanojevic; Jekaterina Novikova

Multi-modal deep learning system for depression and anxiety detection

Brian Diep, Marija Stanojevic, Jekaterina Novikova

Published: 21 Oct 2022, Last Modified: 05 May 2023PAI4MH 2022 OralReaders: Everyone

Keywords: depression, anxiety, crowd-sourced data, deep-learning, hand-crafted features, speech, wav2vec2, roberta

TL;DR: We present a model for detecting depression and anxiety from deep-learned and domain-knowledge informed hand-crafted features extracted from crowd-sourced speech data

Abstract: Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into an individual's cognitive state and by leveraging different aspects of speech, we can develop new digital biomarkers for depression and anxiety. To this end, we propose a multi-modal system for the screening of depression and anxiety from self-administered speech tasks. The proposed model integrates deep-learned features from audio and text, as well as hand-crafted features that are informed by clinically-validated domain knowledge. We find that augmenting hand-crafted features with deep-learned features improves our overall classification F1 score comparing to a baseline of hand-crafted features alone from 0.58 to 0.63 for depression and from 0.54 to 0.57 for anxiety. The findings of our work suggest that speech-based biomarkers for depression and anxiety hold significant promise in the future of digital health.

1 Reply

Loading