FGCM: Modality-Behavior Fusion Model Integrated with Graph Contrastive Learning for Multimodal Recommendation

Published: 2025, Last Modified: 15 Jan 2026IEEE Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal recommender systems (MRSs) aim to integrate information from multiple modalities, for better capturing users’ preferences. However, existing MRSs usually face the challenge of data sparsity, especially for the limited user–item interactions. It is not uncommon that a user interacts with only a few items out of millions. More importantly, the interaction information is easy to be overwhelmed, leading to poor recommendation performance. To tackle these significant gaps for further improving the performance of MRSs, we propose a modality-behavior fusion model integrated with graph contrastive learning for multimodal recommendation (FGCM). First, through the introduction of random noise into graph contrastive learning, FGCM adeptly mitigates data sparsity. Then, a modality-behavior fuser is proposed to effectively fuse interaction data and multimodal side information data by taking cooperative signals and multimodal signals as the guidance. Extensive experiments on three datasets demonstrate that our proposed approach outperforms several state-of-the-art methods.
Loading