Joint Objective and Subjective Fuzziness Denoising for Multimodal Sentiment Analysis

Xun Jiang, Xing Xu, Huimin Lu, Lianghua He, Heng Tao Shen

Published: 01 Jan 2025, Last Modified: 06 Mar 2025IEEE Trans. Fuzzy Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multimodal sentiment analysis (MSA) aims at teaching computers or robotics to understand human sentiment with diverse multimodal signals, including audio, vision, and text. Current MSA approaches primarily concentrate on devising fusion strategies for multimodal signals and trying to learn better multimodal joint representations. However, employing multimodal signals directly is not appropriate since the human psychological states are fuzzy and cannot be categorized easily, which undermines the effectiveness of existing methods. In this article, we regard the natural fuzziness of human sentiments can be observed as two types as follows: 1) objective fuzziness introduced by human expression and 2) subjective fuzziness caused by the complexity of human affection. Based on the assumption, we proposed a novel method termed Joint Objective and Subjective Fuzziness Denoising (JOSFD), which introduced fuzzy logic into the multimodal fusion process and sentiment decision process to overcome the objective and subjective fuzziness. Specifically, our JOSFD method contains two key modules as follows: 1) modality-specific fuzzification module leveraging uncertainty estimation and fuzzy logic to overcome the influence of objective fuzziness in different modalities in multimodal fusion. 2) Attitude-intensity representation disentangling that learns joint representations for human attitude and sentiment strength separately and further employs fuzzy logic to decide the sentiment analysis results. We evaluate our proposed JOSFD method on the following three widely used MSA benchmark datasets: 1) CMU-MOSI, 2) CMU-MOSEI, and 3) CH-SIMS. Extensive experiments demonstrate our proposed JOSFD method outperforms recent state-of-the-art methods.