Abstract: We propose a multi-modal method with a hierarchical recurrent neural structure to integrate vision, audio and text features for depression detection. Such a method contains two hierarchies of bidirectional long short term memories to fuse multi-modal features and predict the severity of depression. An adaptive sample weighting mechanism is introduced to adapt to the diversity of training samples. Experiments on the testing set of a depression detection challenge demonstrate the effectiveness of the proposed method.
0 Replies
Loading