Disentangling Generalization in Reinforcement LearningDownload PDF


Sep 29, 2021 (edited Oct 05, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: Reinforcement learning, generalization
  • Abstract: Generalization in Reinforcement Learning (RL) is usually measured according to concepts from supervised learning. Unlike a supervised learning model however, an RL agent must generalize across states, actions and observations from limited reward-based feedback. We propose to measure an RL agent's capacity to generalize by evaluating it in a contextual decision process that combines a tabular environment with observations from a supervised learning dataset. The resulting environment, while simple, necessitates function approximation for state abstraction and provides ground-truth labels for optimal policies and value functions. The ground truth labels provided by our environment enable us to characterize generalization in RL across different axes: state-space, observation-space and action-space. Putting this method to work, we combine the MNIST dataset with various gridworld environments to rigorously evaluate generalization of DQN and QR-DQN in state, observation and action spaces for both online and offline learning. Contrary to previous reports about common regularization methods, we find that dropout does not improve observation generalization. We find, however, that dropout improves action generalization. Our results also corroborate recent findings that QR-DQN is able to generalize to new observations better than DQN in the offline setting. This success does not extend to state generalization, where DQN is able to generalize better than QR-DQN. These findings demonstrate the need for careful consideration of generalization in RL, and we hope that this line of research will continue to shed light on generalization claims in the literature.
  • One-sentence Summary: We propose a protocol for rigorously evaluating generalization in reinforcement learning across states, observations and actions.
0 Replies