Abstract: Depression is a common and serious medical illness which has a wide negative impact on individuals, families, and society. Automatic Depression Detection (ADD) is increasingly demanded for human healthcare thanks to its objectiveness, convenience, and low cost. Considering that the duration of depressive symptoms varies among different identities and treatment phases, it is essential for ADD methods to have the capability to capture information at various temporal scales. However, most existing ADD methods cannot generate rich contextual cues or utilize long-range temporal dependency effectively. In this paper, we propose a novel approach for depression recognition based on visual behaviors, which employs Atrous Residual Temporal Convolutional Network (DepArt-Net) as well as temporal fusion to capture the long-range dynamic depressive cues. First, the proposed atrous temporal convolution generates multi-scale contextual features from low-level visual behaviors, which are further strengthened by residual blocks across different convolution groups. Second, we introduce the attention mechanism in temporal feature fusion stage, and with the learned attentive distribution, more discriminative video-level depression representation can be acquired. Experimental results on the DAIC-WOZ benchmark demonstrate the effectiveness of the proposed approach and its superiority over other state-of-the-art methods.
0 Replies
Loading