M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start Recommendation

Yanan Wang, Yong Ge, Zhepeng Li, Li Li, Rui Chen

Published: 01 Aug 2024, Last Modified: 05 Jan 2025OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: Reinforcement learning (RL) has shown great promise in optimizing long-term user interest in recommender systems. However, existing RL-based recommendation methods need a large number of interactions for each user to learn the recommendation policy. The challenge becomes more critical when recommending to new users who have a limited number of interactions. To that end, in this article, we address the cold-start challenge in the RL-based recommender systems by proposing a novel context-aware offline meta-level model-based RL approach for user adaptation. Our proposed approach learns to infer each user’s preference with a user context variable that enables recommendation systems to better adapt to new users with limited contextual information. To improve adaptation efficiency, our approach learns to recover the user choice function and reward from limited contextual information through an inverse RL method, which is used to assist the training of a meta-level recommendation agent. To avoid the need for online interaction, the proposed method is trained using historically collected offline data. Moreover, to tackle the challenge of offline policy training, we introduce a mutual information constraint between the user model and recommendation agent. Evaluation results show the superiority of our developed offline policy learning method when adapting to new users with limited contextual information. In addition, we provide a theoretical analysis of the recommendation performance bound.