Abstract: Depression is a prevalent mental ailment that causes many diseases all over the world. Identification of people with mental illness faces a challenge, as there is no difference between mentally ill people and normal people in physiology, and clinicians can only make a subjective diagnosis according to the relevant information of patients. Hence, it has become imperative to develop automated methods for audiovisual depression prediction. Although many studies have been conducted in the field, there still remains a challenge. Long-term temporal context information is difficult to extract from long sequences of aural and visual data. This study aimed to construct a novel transformer-based multimodal network to distinguish depressed patients from normal people. We evaluate our approach on the Chinese Soochow University depressive severity dataset and demonstrate that our method outperforms the existing method.
0 Replies
Loading