- Keywords: Regularization, Policy Optimization, Reinforcement Learning
- TL;DR: We show that conventional regularization methods (e.g., $L_2$, dropout), which have been largely ignored in RL methods, can be very effective in policy optimization.
- Abstract: Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement on the task performance, and the improvement is typically more significant when the task is more difficult. We also compare with the widely used entropy regularization and find $L_2$ regularization is generally better. Our findings are further confirmed to be robust against the choice of training hyperparameters. We also study the effects of regularizing different components and find that only regularizing the policy network is typically enough. We hope our study provides guidance for future practices in regularizing policy optimization algorithms.
- Code: https://drive.google.com/open?id=1oy5Tyg8JUqTDZ-2jA8HQuoejchIMgnG6