Abstract: Multi-modal recommendation improves the recommendation accuracy by leveraging various modalities (e.g., visual, textual, and acoustic) of rich item content. However, most existing studies overlook that modality features can be noisy for recommendation. Recently, several graph-based methods have attempted to alleviate the noise issue via graph structure learning, but they roughly focus on edge denoising while neglecting node denoising. This will lead to the following circumstances: 1) noisy factors at the node level and 2) insufficient edge-level denoising. To address the limitations, we propose a Denoised Modality-guided Graph Learning paradigm (DMGL), which could jointly and iteratively eliminate both the node-level and edge-level noise for multi-modal recommendation. Meanwhile, masked feature autoencoder and contrastive learning mechanism are introduced to handle intra- and inter-modality node-level noise, respectively. Extensive experiments on real-world datasets demonstrate the superior performance of our proposed model. The codes are available here.
Loading