Temporal Enhancement for Video Affective Content Analysis

Xin Li; Shangfei Wang; Xuandong Huang

Temporal Enhancement for Video Affective Content Analysis

Xin Li, Shangfei Wang, Xuandong Huang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the popularity and advancement of the Internet and video-sharing platforms, video affective content analysis has been greatly developed. Nevertheless, existing methods often utilize simple models to extract semantic information. This might not capture comprehensive emotional cues in videos. In addition, these methods tend to overlook the presence of substantial irrelevant information in videos, as well as the uneven importance of modalities for emotional tasks. This could result in noise from both temporal fragments and modalities, thus diminishing the capability of the model to identify crucial temporal fragments and recognize emotions. To tackle the above issues, in this paper, we propose a Temporal Enhancement (TE) method. Specifically, we employ three encoders for extracting features at various levels and sample features to enhance temporal data, thereby enriching video representation and improving the model's robustness to noise. Subsequently, we design a cross-modal temporal enhancement module to enhance temporal information for every modal feature. This module interacts with multiple modalities at once to emphasize critical temporal fragments while suppressing irrelevant ones. The experimental results on four benchmark datasets show that the proposed temporal enhancement method achieves state-of-the-art performance in video affective content analysis. Moreover, the effectiveness of each module is confirmed through ablation experiments.

Primary Subject Area: [Engagement] Emotional and Social Signals

Secondary Subject Area: [Experience] Multimedia Applications, [Content] Multimodal Fusion

Relevance To Conference: Our paper investigates the field of video affective content analysis, which involves multiple modalities and emotions, which are highly relevant to the themes covered by ACM MM. We propose a temporal enhancement method to enrich modal representation and promote interaction between modalities, which is consistent with the conference's focus on advancing multimedia research and technology. We believe that our research findings will contribute to the ongoing discussions and progress in the multimedia community, making our submission suitable for publication in ACM MM.

Supplementary Material: zip

Submission Number: 5167

Loading