Keywords: Multi-Domain Recommendation, Large Language Model
Abstract: Due to the increasingly severe information overload issue, the application of recommendation systems (RS) prevails across all kinds of Internet platforms, as they can provide personalized items for each user. Recently, with the growing number of specialized domains in one comprehensive platform, such as short video recommendations, article recommendations, and product recommendations in the same app, multi-domain recommendation has garnered significant attention. The multi-domain recommendation can simultaneously leverage knowledge from different domains, alleviating the data sparsity issue and allowing a single model to make recommendations across multiple domains, reducing the deploying costs.
Nevertheless, the historical data size between different domains varies. Some areas may have significantly more data than others, namely the rich or cold-start scenarios. This disparity in data size can lead to certain limitations during model training. For instance, the learning of domain-specific parameters in cold-start scenarios may be insufficient, while the learning of domain-shared parameters may be dominated by rich scenarios. Previous work
mainly addresses these issues through meticulous structural design of the models.
In this paper, we adopt a different perspective by addressing this issue from the data standpoint. The emergence of LLMs has made it possible to generate virtual user and item data. Furthermore, LLMs, with their extensive world knowledge and outstanding comprehending capability, have demonstrated impressive recommendation capabilities in cold-start scenarios. In this case, we utilize the LLMs to simulate users in cold-start scenarios and synthesize more sufficient positive samples after learning from existing multi-domain historical interactions. Through an elaborately designed data filtering and denoising strategy, the recommendation quality of the multi-domain models can be enhanced.Moreover, through the lens of recommendation systems, we may get more insight into the synthetic data from existing LLMs.
Submission Number: 7
Loading