Towards Understanding Deep Policy Gradients: A Case Study on PPODownload PDF

14 Dec 2020 (modified: 05 May 2023)CUHK 2021 Course IERG5350 Blind SubmissionReaders: Everyone
Abstract: Deep reinforcement learning has shown impressive performance on many decision-making problems, where deep policy gradient algorithms prevail in continuous action space tasks. Although many algorithm-level improvements on policy gradient algorithms have been proposed, recent studies have found that code-level optimizations also play a critical role in the claimed enhancement. In this paper, we further investigate several code-level optimizations for the popular Proximal Policy Optimization (PPO) algorithm, aiming to provide insights into the importance of different components in the practical implementations.\footnote{Video presentation is available at \url{https://youtu.be/M0uTLoEUwGQ}}
3 Replies

Loading