Abstract: Visual Emotion Analysis (VEA) aims to research what emotions are evoked in the viewer by watching visual content. Existing methods often operate on pixel-level, neglecting the complexity and abstract process of emotion. To tackle this challenge, we propose a novel multi-dimension full scene integrated network, focusing on scene feature, background style information and facial region. For making the proposed network more relevant to effective information, we leverage the channel attention mechanism and we design a multi-dimension loss function to distinguish the basic emotion categories mixed with each other. Experiments show our proposed method outperforms the state-of-the-art approaches on four public visual emotion datasets, especially where it outperforms existing state-of-the-art methods by a large margin, +4.22% on the Emotion6 dataset at six classes classification accuracy. Moreover, ablation study and visualization prove the effectiveness of our method. The code is available at https://github.com/AlchemistPenn/MDFSINet.
Loading