Abstract: Emotions are closely related to many mental and cognitive diseases, such as depression, mania, Parkinson’s Disease, etc, and the recognition of emotion plays an important role in diagnosis of these diseases, which is mostly limited to the patient’s self-description. Because emotion is always unstable, the objective quantitative methods are urgently needed for more accurate recognition of emotion, which can help improve the diagnosis performance for emotion related brain disease. Existing studies have shown that EEG and facial expressions are highly correlated, and combining EEG with facial expressions can better depict emotion-related information. However, most of the existing multi-modal emotion recognition studies cannot combine multiple modalities properly, and ignore the temporal variability of channel connectivity in EEG. In this paper, we propose a spatial-temporal feature extraction framework for multi-modal emotion recognition by constructing prior-driven Dynamic Functional Connectivity Networks (DFCNs). First, we consider each electrode as a node to construct the original dynamic brain networks. Second, we calculate the correlation between EEG and facial expression through cross attention, as a prior knowledge of dynamic brain networks, and embedded to obtain the final DFCNs representation with prior knowledge. Then, we design a spatial-temporal feature extraction network by stacking multiple residual blocks based on 3D convolutions, and non-local attention is introduced to capture the global information at the temporal level. Finally, we adopt the features from fully connected layer for classification. Experimental results on the DEAP dataset demonstrate the effectiveness of the proposed method.
Loading