Enhancing Multimedia Recommendation Through Item-Item Semantic Denoising and Global Preference Awareness
Abstract: Multimedia recommendation aims to predict whether users will interact with multimodal items. A few recent works that explicitly learn the semantic structure between items using multimodal features manifest impressive performance gains. This is mainly attributed to the capability of graph convolutional networks (GCNs) to learn superior item representations by propagating and aggregating information from high-order neighbors on the semantic structure. However, they still suffer from two major challenges: a) the noisy relations (edges) in the item-item semantic structure disrupt information propagation and generate low-quality item representations, which impairs the effectiveness and robustness of existing methods; b) the lack of an optimization objective that exploits informative samples and global preference information leads to suboptimal training of the model, which makes users and items indistinguishable in the embedding space. To overcome these challenges, we propose Enhancing Multi media Recommendation through Item-Item Semantic Denoising and Global Preference Awareness (MMGPA). Specifically, the model contains the following two components: (1) a modal semantic representation network is carefully designed to learn the high-quality multimodal representation of items by modeling the denoised item-item semantic structure, and (2) a global preference-aware optimization objective prioritizes the most informative hard sample pairs while constraining the multiple preference distances to better separate the embedding space. Extensive experimental results demonstrate that the proposed method outperforms various state-of-the-art competitors on three public benchmark datasets.
Loading