- Keywords: Deep Reinforcement Learning, Generalization, Regularization
- Abstract: Training agents to operate in one environment often yields overfitted models that are unable to generalize to the changes in that environment. However, due to the numerous variations that can occur in the real-world, the agent is often required to be robust in order to be useful. This has not been the case for agents trained with reinforcement learning (RL) algorithms. In this paper, we investigate the overfitting of RL agents to the training environments in visual navigation tasks. Our experiments show that deep RL agents can overfit even when trained on multiple environments simultaneously. We propose a regularization method which combines RL with supervised learning methods by adding a term to the RL objective that would encourage the invariance of a policy to variations in the observations that ought not to affect the action taken. The results of this method, called Invariance Regularization, show an improvement in the generalization of policies to environments not seen during training. The experimentation is done on the VizDoom environment which contains hundreds of textures, so allowing us to investigate generalization to changes in the visual observation.