M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start Recommendation
Abstract: Reinforcement learning (RL) has shown great promise in optimizing long-term user interest in recommender
systems. However, existing RL-based recommendation methods need a large number of interactions for each
user to learn the recommendation policy. The challenge becomes more critical when recommending to new
users who have a limited number of interactions. To that end, in this article, we address the cold-start challenge
in the RL-based recommender systems by proposing a novel context-aware offline meta-level model-based
RL approach for user adaptation. Our proposed approach learns to infer each user’s preference with a user
context variable that enables recommendation systems to better adapt to new users with limited contextual
information. To improve adaptation efficiency, our approach learns to recover the user choice function and
reward from limited contextual information through an inverse RL method, which is used to assist the training
of a meta-level recommendation agent. To avoid the need for online interaction, the proposed method is
trained using historically collected offline data. Moreover, to tackle the challenge of offline policy training, we
introduce a mutual information constraint between the user model and recommendation agent. Evaluation
results show the superiority of our developed offline policy learning method when adapting to new users
with limited contextual information. In addition, we provide a theoretical analysis of the recommendation
performance bound.
Loading