Towards Understanding Deep Policy Gradients: A Case Study on PPODownload PDF

Dec 14, 2020 (edited Dec 26, 2020)CUHK 2021 Course IERG5350 Blind SubmissionReaders: Everyone
  • Abstract: Deep reinforcement learning has shown impressive performance on many decision-making problems, where deep policy gradient algorithms prevail in continuous action space tasks. Although many algorithm-level improvements on policy gradient algorithms have been proposed, recent studies have found that code-level optimizations also play a critical role in the claimed enhancement. In this paper, we further investigate several code-level optimizations for the popular Proximal Policy Optimization (PPO) algorithm, aiming to provide insights into the importance of different components in the practical implementations.\footnote{Video presentation is available at \url{}}
3 Replies