Track: Web mining and content analysis
Keywords: Multimodal Data, Multimodal Affective Computing, Ordinal Learning, Sentiment Analysis
TL;DR: We propose an ordinal learning framework for multimodal affective computing that aligns with the ordinal nature of affective concepts.
Abstract: Multimodal affective computing aims to integrate information from multiple modalities for the analysis of human affective states, opinion tendencies, behavior intentions, etc. Previous studies primarily focus on approximating predictions to annotated labels, often neglecting the ordinal nature of affective states. In this paper, we address this issue by exploring ordinal learning, and a Multimodal Ordinal Affective Computing (MOAC) framework is designed to enhance the understanding of the nature of affective concepts. Specifically, we propose coarse-grained label-level ordinal learning that prompts the model to \textit{learn to compare} in the label space, encouraging higher predictive values for samples annotated with larger labels over those with smaller labels. Moreover, a regularization loss is proposed to prevent the output distributions from deviating significantly from the annotated label distributions. Fine-grained feature-level ordinal learning is then performed via the feature difference operation and the neutral embedding. The former compares samples in the feature space, calculating the difference between features of different samples to generate `new' features for a more robust training. The latter seeks to reduce the difficulty of prediction by estimating the difference between the target multimodal representations and a neutral reference. We first demonstrate MOAC in multimodal sentiment analysis, which is a regression task that aligns well with the function of ordinal learning. Then we extend MOAC to classification tasks including multimodal humor detection and sarcasm detection to evaluate its generalizability. Experiments suggest that MOAC outperforms state-of-the-art methods.
Submission Number: 334
Loading