Meta Learning for Adaptive Disentangled User Preference Integration Toward Multimodal Recommendation

Zhenchao Wu, Hongteng Xu, Xu Chen

Published: 2025, Last Modified: 01 Mar 2026MMSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal recommender systems exploit multi-modal information of items as an auxiliary, to learn more informative representations and achieve better performances. Most existing methods directly combine ID representations and different modality representations in trivial ways (e.g., element-wise sum, or attention mechanism). However, these strategies hardly accurately align different representations of the same user/item (e.g., the ID and modality representations), since the ID representation carries the overall semantic features of the user/item while each modality representation merely contains its partial semantic content, and the semantic information from different modalities may be partially complementary. To address this issue, we design a disentangled representation learning to decouple different representations of the user/item into different semantic components, in which any two components should carry either complementary or common semantic information. In each semantic scene, we first calculate the interaction probability between each pairwise components (e.g., a user component and an item component), and then customize their weights with a meta-weight net to integrate these probabilities as the current semantic score. Afterward, we fuse different semantic scores to estimate the final probability of user-item interaction. In addition, we introduce two contrastive learning tasks to maintain the consistency between multiple self-supervised views. Experimental evaluations on three datasets indicate that the proposed approach consistently outperforms current state-of-the-art baselines.
Loading