MMCDSR: a Multimodal and Cross-domain Fusion Framework for Sequential Recommendation

Yitong Xu, Guohao Sun, Jinhu Lu, Li Yang, Xiu Fang, Yanting Zhang

Published: 2025, Last Modified: 21 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Sequential recommendation (SR) aims to predict users’ next actions based on historical interaction sequences. Classical methods are developed based on deep learning mechanisms such as CNN, RNN, and Transformer to capture users’ behavioral sequential patterns. Despite their certain achievements, they still face challenges like data sparsity and limited understanding of item features. To address these issues, researchers have proposed the application of auxiliary information to enhance SR. In this paper, we realized that both cross-domain and multimodal information can be leveraged as auxiliary information to further improve the performance of SR, but how to make full use of them is faced with problems such as semantic inconsistency, inadequate user preferences mining, and the introduction of noise. To this end, we propose a MultiModal Cross-Domain Sequential Recommendation (MMCDSR) method, serving as a framework to jointly model modal and domain information for application in the SR. In MMCDSR, we design, (1) a semantic contrastive learning module to align modal and domain representations of items, (2) a sequential interest discovery module for capturing user preferences from different perspectives, and (3) an adaptive attention fusion module to eliminate noise features and generate the final user representation for the recommendation. Extensive experiments on six datasets from Amazon demonstrate that MMCDSR effectively leverages multimodal and cross-domain information, alleviates data sparsity issues, and significantly outperforms current baseline models in recommendation accuracy.

External IDs:dblp:conf/ijcnn/XuSLYFZ25