Keywords: overparametrization, over-parameterization, reinforcement learning, deep reinforcement learning, generalization
TL;DR: Over-parametrization in width seems to help in deep reinforcement learning, just as it does in supervised learning.
Abstract: There is significant recent evidence in supervised learning that, in the over-parametrized setting, wider networks achieve better test error. In other words, the bias-variance tradeoff is not directly observable when increasing network width arbitrarily. We investigate whether a corresponding phenomenon is present in reinforcement learning. We experiment on four OpenAI Gym environments, increasing the width of the value and policy networks beyond their prescribed values. Our empirical results lend support to this hypothesis. However, tuning the hyperparameters of each network width separately remains as important future work in environments/algorithms where the optimal hyperparameters vary noticably across widths, confounding the results when the same hyperparameters are used for all widths.