Abstract: Multimodal recommendation systems improve the accuracy of recommendations by integrating information from different modalities to obtain potential representations of users and items. However, existing multimodal recommendation methods often use single user embedding to model users’ interests in different modalities, neglecting multimodal information. Furthermore, the semantics expressed by the same items in different modalities may be inconsistent, leading to suboptimal recommendation performance. To alleviate the impact of these issues, we propose a new multimodal recommendation framework called Learnable Prompts and ID-guided Contrastive Learning (LPIC). Specifically, we introduce a continuously learnable prompt embedding method, incorporating multimodal features of items to model users’ interests in specific modalities. Then, we propose an ID-guided contrastive learning component to enhance historical interaction features in textual, visual, and fused modalities, while aligning text, image, and fused modality to enhance semantic consistency between modalities. Finally, we conduct extensive experiments on three publicly available Amazon datasets to demonstrate the effectiveness of the LPIC framework.
External IDs:dblp:journals/tomccap/LiuSXWG25
Loading