SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

Published: 06 Mar 2025, Last Modified: 27 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Representation, Recommender System, Machine Learning, Deep Learning, Robust Learning
Abstract:

Multimodal recommender systems leverage diverse information, to model user preferences and item features, helping users discover relevant products. Integrating multimodal data can mitigate challenges like data sparsity and cold-start, but also introduces risks such as information adjustment and inherent noise, posing robustness challenges. In this paper, we analyze multimodal recommenders from the perspective of flat local minima and leverage the denoising capability of BLIP, a Vision Language Model, to mitigate the inherent noise risk in multimodal inputs. We propose a concise yet effective recommendation training strategy that can implicitly enhance model robustness during optimization, addressing instability risks. Extensive theoretical and empirical analyses demonstrate the superiority of our approach across multimodal recommendation models and benchmarks. The proposed method: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD) complements existing robust training techniques and can be easily extended to advanced recommendation models, making it a promising paradigm for training robust multimodal recommender systems.

Submission Number: 79
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview