Keywords: Deep Reinforcement Learning, Policy Optimization
Abstract: Entropy can play an essential role in policy optimization by selecting the stochastic policy, which eventually helps better explore the environment in reinforcement learning (RL). A proper balance between exploration and exploitation is challenging and might depend on the particular RL task. However, the stochasticity often reduces as the training progresses; thus, the policy becomes less exploratory. Therefore, in many cases, the policy can converge to sub-optimal due to a lack of representative data during training. Moreover, this issue can even be severe in high-dimensional environments. This paper investigates whether keeping a certain entropy threshold throughout training can help better policy learning. In particular, we propose an algorithm Robust Policy Optimization (RPO), which leverages a perturbed Gaussian distribution to encourage high-entropy actions. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym. We observed that in many settings, RPO increases the policy entropy early in training and then maintains a certain level of entropy throughout the training period. Eventually, our agent RPO shows consistently improved performance compared to PPO and other techniques such as data augmentation and entropy regularization. Furthermore, in several settings, our method stays robust in performance, while other baseline mechanisms fail to improve and even worsen the performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)