Mixture of Conditional Attention for Multimodal Fusion in Sequential Recommendation

Published: 2025, Last Modified: 17 Jan 2026PAKDD (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sequential Recommender (SR) systems stand out for their ability to capture dynamic user preference, and multimodal side information has been incorporated to improve the recommendation quality. Most existing approaches, however, rely on predefined deterministic rules reflecting some inductive biases. Despite their useful guidance for the training process, they also limit the model’s capability to explore cross-modal relationships. To address this problem, we introduce the Mixture of Conditional Attention (MOCA) framework, which learns diverse and flexible attention patterns directly from data. MOCA utilizes 1) a conditional attention mechanism to focus on the most relevant features aligned with user intent, and 2) a mixture-of-experts approach to capture a wide range of user preferences effectively. Extensive experiments on multiple datasets demonstrate the superiority of our model over state-of-the-art SR models. The code is available at https://github.com/snuviplab/MOCA
Loading