Hierarchical Reasoning Enhanced Few-Shot Multimodal Sentiment Analysis

Published: 2025, Last Modified: 05 Feb 2026Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Few-shot Multimodal Sentiment Analysis (FMSA) aims to predict sentiment with minimal labeled data by integrating multiple modalities, such as text and images. While recent FMSA methods have focused on transforming non-linguistic information (e.g., images) into text and leveraging language models to convert them into few-shot filling tasks, they still struggle to capture the latent sentiment information in image–text pairs. These limitations hinder their effectiveness, particularly in real-world applications where labeled data is scarce. To address these limitations, we propose a novel approach, Hierarchical Reasoning Enhanced Few-shot Multimodal Sentiment Analysis (HRE-FMSA), which consists of three main components: the Hierarchical Reasoning Framework (HRF), the Hierarchical Reasoning Representation Fusion Network (H2RF-Net), and label prediction. Concretely, the HRF module excavates latent sentiment information from image–text pairs at three levels: topic/aspect, opinion, and sentiment. Then, H2RF-Net integrates latent sentiment information with the original image–text pairs to generate a prompt, which is fed into a pre-trained Language Model to obtain the final sentiment type. In the experiment, we conducted comprehensive evaluations on three sentence-level datasets and two aspect-level datasets, demonstrating the effectiveness and applicability of HRE-FMSA.
Loading