Affective Video Content Analyses by Using Cross-Modal Embedding Learning Features

Benchao Li, Zhenzhong Chen, Shan Li, Wei-Shi Zheng

Published: 2019, Last Modified: 16 May 2023ICME 2019Readers: Everyone

Abstract: Most existing methods on affective video content analyses are dedicated to single media, either visual content or audio content and few attempts for combined analysis of the two media signals are made. In this paper, we employ a cross-modal embedding learning approach to learn the compact feature representations of different modalities that are discriminative for analyzing the emotion attributes of the video. Specifically, we introduce inter-modal similarity constraints and intra-modal similarity constraints to promote the joint embedding learning procedure for obtaining the robust features. In order to capture cues in different grains, global and local features are extracted from both visual and audio signals, thereafter a unified framework consisting with global and local features embedding networks is built for affective video content analyses. Experiments show that our proposed approach significantly outperforms the state-of-the-art methods and demonstrate the effectiveness of our approach.

0 Replies