Balancing User-Item Structure and Interaction with Large Language Models and Optimal Transport for Multimedia Recommendation

Published: 01 Jan 2025, Last Modified: 09 Oct 2025IJCAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rapid growth of multimedia content has driven the development of recommender systems. Most previous work focuses on uncovering latent relationships among items to learn better representations. However, this approach does not sufficiently account for user affinities, potentially leading to an imbalance in the structure modeling of users and items. Moreover, the sparsity and imbalance of user-item interactions further hinder effective representation learning. To address these challenges, we propose a framework called BLAST, which balances structures and interactions via large language models and optimal transport for multimodal recommendation. Specifically, we utilize large language models to summarize side information and generate user profiles. Based on these profiles, we design an intra- and inter-entity structure balancing module to capture item-item and user-user relationships, integrating these affinities into the final representations. Furthermore, we impose constraints on negative sample selection, augment the training data with false negative items and the optimal transport algorithm, thereby leading to smoother interactions. We evaluate BLAST on three real-world datasets, and the results demonstrate that our method significantly outperforms state-of-the-art baselines, which validates the superiority and effectiveness of BLAST.
Loading