Safe Coupled Deep Q-Learning for Recommendation Systems

Runsheng Yu, Yu Gong, Rundong Wang, Bo An, Qingwen Liu, Wenwu Ou

08 Jan 2021 (modified: 08 Jan 2021)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: Reinforcement Learning (RL) is one of the prevailing approaches to optimize long-term user engagement in Recommendation Systems (RS). However, the well-known exploration strategies of RL (e.g., the $\epsilon$-greedy strategy) encourage agents to interact and explore the environment freely, which may recommend unpleasant items to the users frequently, violating their preferences and making them lose confidence in the RS platform. To avoid such irrelevant and unpleasant recommendations, we propose a novel safe RL approach to maximize accumulated long-term reward under the safety guarantee. Our contributions are three-fold. Firstly, we introduce a novel training scheme with two value functions to maximize the accumulated long-term reward under the safety constraint. Secondly, we theoretically show that our methods are able to converge and maintain safety with a high probability during the training process. Thirdly, we implement two practical methods, including a Simhash-based method as well as a relaxation method for large-scale environments. Experiments on immediate recommendation, sequential recommendations, as well as safe gridworld reveal that our methods outperform the state-of-the-arts dramatically.

0 Replies