Track: User modeling, personalization and recommendation
Keywords: Transferable recommendation, Multi-modality alignment, Explainable alignment
Abstract: With the development of multi-modality data modeling techniques, recent recommender systems use not only textual data and user-item interactions but also multi-modality data such as images to improve their performances. Existing methods typically adopt cross-modal pairwise alignment strategies to alleviate the gap between modalities. Nevertheless, this alignment paradigm has limitations on explainability, consistency, and expansibility, which may only achieve suboptimal performances. In this paper, we propose a novel Explainable generative multi-modality Alignment method for transferable Recommender systems, i.e., EARec. Specifically, we design a two-stage pipeline to achieve unified multi-modality alignment of items and the sequential recommendation task, respectively. In the first phase, we present a generation task that parallel aligns each modality from multiple source domains to an anchor with explainable meaning. Three modality features share the same anchor to achieve a consistent alignment direction. Additionally, we incorporate behavior-related information as an independent modality into the alignment framework, establishing a bridge that promotes the alignment between multi-modalities and behavior. In the second stage, we composite the aligned modality encoders into a unified one and then transfer it to the target domain to enhance sequential recommendation. The pipeline that adopts parallel multi-modal alignment and composition shows flexibility and scalability for incorporating new modalities. Experimental results on multiple public datasets demonstrate the superiority of EARec over multi-modality recommendation baselines and further analysis indicates the explainability of generative alignment.
Submission Number: 2352
Loading