Keywords: Generalization, Deep Reinforcement Learning, Invariant Representation
TL;DR: We propose a regularization term that, when added to the reinforcement learning objective, allows the policy to maximize the reward and simultaneously learn to be invariant to the irrelevant changes within the input..
Abstract: Training agents to operate in one environment often yields overfitted models that are unable to generalize to the changes in that environment. However, due to the numerous variations that can occur in the real-world, the agent is often required to be robust in order to be useful. This has not been the case for agents trained with reinforcement learning (RL) algorithms. In this paper, we investigate the overfitting of RL agents to the training environments in visual navigation tasks. Our experiments show that deep RL agents can overfit even when trained on multiple environments simultaneously.
We propose a regularization method which combines RL with supervised learning methods by adding a term to the RL objective that would encourage the invariance of a policy to variations in the observations that ought not to affect the action taken. The results of this method, called invariance regularization, show an improvement in the generalization of policies to environments not seen during training.
Data: [VizDoom](https://paperswithcode.com/dataset/vizdoom)
Original Pdf: pdf
7 Replies
Loading