Abstract: Multimedia recommendation forms a personalized ranking task with multimedia content representations which are mostly extracted via generic encoders. However, the generic representations introduce spurious correlations --- the meaningless correlation from the recommendation perspective. For example, suppose a user bought two dresses on the same model, this co-occurrence would produce a correlation between the model and purchases, but the correlation is spurious from the view of fashion recommendation. Existing work alleviates this issue by customizing preference-aware representations, requiring high-cost analysis and design. In this paper, we propose an Invariant Representation Learning Framework (InvRL) to alleviate the impact of the spurious correlations. We utilize environments to reflect the spurious correlations and determine each environment with a set of interactions. We then learn invariant representations --- the inherent factors attracting user attention --- to make a consistent prediction of user-item interaction across various environments. In this light, InvRL proposes two iteratively executed modules to cluster user-item interactions and learn invariant representations. With them, InvRL trains a final recommender model thus mitigating the spurious correlations. We demonstrate InvRL on a cutting-edge recommender model UltraGCN and conduct extensive experiments on three public multimedia recommendation datasets, Movielens, Tiktok, and Kwai. The experimental results validate the rationality and effectiveness of InvRL. Codes are released at https://github.com/nickwzk/InvRL.
0 Replies
Loading