ACP based reinforcement learning for long-term recommender system

Tianyi Huang, Min Li, William Zhu

Published: 01 Jan 2022, Last Modified: 12 May 2023Int. J. Mach. Learn. Cybern. 2022Readers: Everyone

Abstract: Recommender systems aim to suggest the items which can best fit the needs of the users and thus play an important role in online services. To get a satisfactory recommendation, some researchers model the recommendation procedure as a Markov decision process where the recommender is the agent and the users are the environment. Then, they use reinforcement learning to perform the recommendation by sharing the browsing histories of different users. However, when the number of users is large, there will be much noise in the sharing process, limiting the ability of reinforcement learning to generate a satisfactory recommendation. ACP approach is proposed to deal with social computing by learning a parallel system from the real system. There can be less noise in the parallel system than that in the real system with an effective learning process, thus the ACP approach has the potential to address the noise in the recommendation. In this paper, we combine the ACP approach into the reinforcement learning based recommender system to deal with the noise and thus improve the recommendation. Firstly, based on the ACP approach, we train a parallel environment of the real environment. Then we use the trained parallel environment to predict the future state in the Markov decision process of the recommender system. There will be less noise in the predicted states than that in the original states, since the output of our parallel environment is effectively learned by the expectation of the future state in the deep neural network. Finally, instead of the original states, we use the predicted states to generate the recommendation list in the reinforcement learning for the recommendation. In this way, the generated recommendation list can be better with less noise from the states. The theoretical analysis and the experiment illustrate that our recommender system can better perform the recommendation than existing recommender systems.

0 Replies