Boosting Guided Diffusion with Large Language Models for Multimodal Sequential Recommendation

Te Song, Lianyong Qi, Weiming Liu, Fan Wang, Xiaolong Xu, Hongsheng Hu, Yang Cao, Xuyun Zhang, Amin Beheshti

Published: 27 Oct 2025, Last Modified: 15 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Recent advancements in generative models have positioned them as one of the principal tools for sequential recommendation due to their exceptional sample diversity and generalization capabilities. Among these, diffusion model-based sequential recommenders have achieved remarkable success. However, most existing approaches still face critical challenges, resulting in suboptimal generation quality: (1) They fail to leverage multimodal knowledge for constructing item representations with well-structured distributional characteristics and semantically enriched information; (2) They predominantly rely on discrete diffusion processes, leading to high error accumulation, reduced time efficiency, and constrained controllability in generative sampling. To mitigate these challenges, we propose LSGM4Rec, a novel framework that integrates Large Language Models (LLMs) with advanced multimodal encoding models to establish multimodal fusion embeddings for items. This design ensures distinct distributional characteristics while enabling the incorporation of semantically rich modal features into guidance condition. Furthermore, we pioneer the stochastic differential equations (SDEs) for recommendation, facilitating smooth transitions between data distributions and enabling optimal trade-off between sampling efficiency and generation quality. Extensive experiments on three datasets demonstrate that LSGM4Rec outperforms existing state-of-the-art sequential recommendation methods.
Loading