Towards Multi-Interest Pre-training with Sparse Capsule Network

Zuoli Tang, Lin Wang, Lixin Zou, Xiaolu Zhang, Jun Zhou, Chenliang Li

Published: 01 Jan 2023, Last Modified: 02 Oct 2023SIGIR 2023Readers: Everyone

Abstract: The pre-training paradigm, i.e., learning universal knowledge across a wide spectrum of domains, has increasingly become a new de-facto practice in many fields, especially for transferring to new domains. The recent progress includes universal pre-training solutions for recommendation. However, we argue that the common treatment utilizing the masked language modeling or simple data augmentation via contrastive learning is not sufficient for pre-training a recommender system, since a user's intent could be more complex than predicting the next word or item. It is more intuitive to go a step further by devising the multi-interest driven pre-training framework for universal user understanding. Nevertheless, incorporating multi-interest modeling in recommender system pre-training is non-trivial due to the dynamic, contextual, and temporary nature of the user interests, particularly when the users are from different domains. The limited effort on this line has greatly rendered it as an open question. In this paper, we propose a novel Multi-Interest Pre-training with Sparse Capsule framework (named Miracle). Miracle performs a universal multi-interest modeling with a sparse capsule network and an interest-aware pre-training task. Specifically, we utilize a text-aware item embedding module, including an MoE adaptor and a deeply-contextual encoding component, to model contextual and transferable item representations. Then, we propose a sparse interest activation mechanism coupled with a position-aware capsule network for adaptive interest extraction. Furthermore, an interest-level contrastive pre-training task is introduced to guide the sparse capsule network to learn universal interests precisely. We conduct extensive experiments on eleven real-world datasets and eight baselines. The results show that our method significantly outperforms a series of SOTA on these benchmark datasets. The code is available at https://github.com/WHUIR/Miracle.

0 Replies