Mixture of Conditional Attention for Multimodal Fusion in Sequential Recommendation

Sewon Lee, Kwangeun Yeo, Eungi Kim, Jinri Kim, Chanwoo Kim, Yujin Jeon, Jooyoung Kim, Joonseok Lee

Published: 01 Jan 2025, Last Modified: 06 Dec 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Sequential Recommender (SR) systems stand out for their ability to capture dynamic user preference, and multimodal side information has been incorporated to improve the recommendation quality. Most existing approaches, however, rely on predefined deterministic rules reflecting some inductive biases. Despite their useful guidance for the training process, they also limit the model’s capability to explore cross-modal relationships. To address this problem, we introduce the Mixture of Conditional Attention (MOCA) framework, which learns diverse and flexible attention patterns directly from data. MOCA utilizes 1) a conditional attention mechanism to focus on the most relevant features aligned with user intent, and 2) a mixture-of-experts approach to capture a wide range of user preferences effectively. Extensive experiments on multiple datasets demonstrate the superiority of our model over state-of-the-art SR models. The code is available at https://github.com/snuviplab/MOCA
Loading