Policy Gradient for items Recommendation on Virtual TaobaoDownload PDF

Dec 14, 2020 (edited Dec 26, 2020)CUHK 2021 Course IERG5350 Blind SubmissionReaders: Everyone
  • Abstract: Recent years have witnessed digital content appear with plenty of forms (including online courses, online shopping and e-news) in daily life of people, which has provided with opportunities as well as challenges for systems to provide users with personalized services and information. The goal of our project is to design a recommender algorithm that can return a good list such that the consumers might have high chance of clicking the items on a simulated environment named Virtual Taobao, a simulator trained from the real-data from Taobao. Firstly, We tried some state-of-art deep-reinforcement algorithms, such as deep deterministic policy gradient (DDPG) method and Twin Delayed DDPG (TD3), what's more, we also used the Proximal Policy Optimisation (PPO) algorithm and tried to improve the PPO algorithm with the features of the attributes of the consumers. Video Link: https://drive.google.com/file/d/1WVoAjKcJ-4t5o6n_U5KaoDn7BymksYqr/view?usp=sharing
2 Replies