Unsupervised Active Pre-Training for Reinforcement LearningDownload PDF

Sep 28, 2020 (edited Mar 05, 2021)ICLR 2021 Conference Blind SubmissionReaders: Everyone
  • Reviewed Version (pdf): https://openreview.net/references/pdf?id=FJMp6UM5m
  • Keywords: Reinforcement Learning, Unsupervised Learning, Entropy Maximization, Contrastive Learning, Self-supervised Learning, Exploration
  • Abstract: We introduce a new unsupervised pre-training method for reinforcement learning called $\textbf{APT}$, which stands for $\textbf{A}\text{ctive}\textbf{P}\text{re-}\textbf{T}\text{raining}$. APT learns a representation and a policy initialization by actively searching for novel states in reward-free environments. We use the contrastive learning framework for learning the representation from collected transitions. The key novel idea is to collect data during pre-training by maximizing a particle based entropy computed in the learned latent representation space. By doing particle based entropy maximization, we alleviate the need for challenging density modeling and are thus able to scale our approach to image observations. APT successfully learns meaningful representations as well as policy initializations without using any reward. We empirically evaluate APT on the Atari game suite and DMControl suite by exposing task-specific reward to agent after a long unsupervised pre-training phase. On Atari games, APT achieves human-level performance on $12$ games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch. Importantly, the pre-trained models can be fine-tuned to solve different tasks as long as the environment does not change. Finally, we also pre-train multi-environment encoders on data from multiple environments and show generalization to a broad set of RL tasks.
  • One-sentence Summary: We propose APT, a reward-free pre-training approach which is based on maximizing particle-based entropy in contrastive representation space for learning pre-trained models that can be leveraged for solving downstream tasks efficiently
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
11 Replies