Abstract: Multimodal sentiment analysis (MSA) with missing modalities involves understanding the person's sentiment using multimodal data where some modalities are missing. Most existing methods focus on reconstructing the missing modalities using the available modalities from each sample, relying on modality-common information. However, these methods overlook the modality-specific information that other samples can provide. Additionally, these approaches often require the guidance of full modality representations during the reconstruction process, which is impractical in resource-constrained real-world scenarios. To address these challenges, we propose the Intra-sample and Intra-modal Enhancement (IIE) framework. The IIE framework enhances both sample-level and modality-level representations to capture additional modality-common and modality-specific information from existing modalities, without requiring full modalities. Specifically, IIE first learns sample-level representations by distilling modality-common information from the available modalities into learnable latent units. Then, it enhances modality-level representations by leveraging modality-specific information from other samples with the same modality, which is crucial for improving robustness in the presence of missing modalities. Finally, IIE ensures consistency between the enhanced modality-level and sample-level representations, combining the enhanced and initial representations to make predictions. Extensive experiments on three datasets demonstrate that the IIE framework significantly outperforms existing methods in terms of both effectiveness and robustness in handling MSA with missing modalities. Code is available at https://github.com/YetZzzzzz/IIE.
External IDs:doi:10.1109/tmm.2025.3645559
Loading