Capturing Cross-View Dynamics Using Recurrent Neural Networks for Multi-modal Sentiment Analysis

Pranav Chitale, Tanvi Dhope, Dhananjay Kalbande

Published: 01 Jan 2023, Last Modified: 04 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Sentiment analysis through multi-modal approaches has shown the potential to outperform uni-modal approaches. One of the challenges in this domain is to effectively model cross-view dynamics from view-specific dynamics. This paper proposes a model that captures both dynamics, and applies attention over the contributing features from each modality, to predict utterance-level sentiments. In the model, the paper introduces a deep learning pipeline called the Cross-view Recurrent Neural Network Pair to compute cross-view dynamics and integrate them with view-specific dynamics, to obtain contextually rich utterance representations. The proposed model is evaluated on CMU Multi-modal Opinion-level Sentiment Intensity (CMU-MOSI) and CMU Multi-modal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) datasets. The model achieves an accuracy of 81.78% on CMU-MOSI and 80.45% on CMU-MOSEI.

External IDs:doi:10.1007/978-981-19-8477-8_20