TRE: Mitigating Label Noise in Multimodal Aspect-Based Sentiment Analysis via LLM-Guided Dataset Reformation

TRE: Mitigating Label Noise in Multimodal Aspect-Based Sentiment Analysis via LLM-Guided Dataset Reformation

ACL ARR 2025 May Submission4075 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the rapid development of social media, Multimodal Aspect-based Sentiment Analysis (MABSA) has garnered significant attention. The integration of diverse modalities in MABSA presents a unique set of challenges. Among the most commonly used datasets in MABSA are Twitter-2015 and Twitter-2017. During our research, however, we identified labeling errors in these datasets, which we believe contribute to the difficulty in improving MABSA model accuracy. To address this issue, we introduced an expert system based on Large Language Models (LLMs) to assist in filtering abnormal samples and relabeling them manually. This process led to the creation of the $\textbf{T}$witter-$\textbf{RE}$vised datasets, namely TRE-2015 and TRE-2017. Experimental results indicate that our proposed TER dataset provides more accurate sentiment annotations while preserving well-defined and learnable sentiment features. The dataset exhibits sentiment consistency, making it more effective in enhancing the sentiment analysis capabilities of models. Our complete code and datasets will be made publicly available.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: benchmarking, NLP datasets, argument mining

Contribution Types: Data resources

Languages Studied: English

Submission Number: 4075

Loading