Unsupervised Active Pre-Training for Reinforcement Learning

Hao Liu; Pieter Abbeel

Unsupervised Active Pre-Training for Reinforcement Learning

Hao Liu, Pieter Abbeel

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Unsupervised Learning, Entropy Maximization, Contrastive Learning, Self-supervised Learning, Exploration

Abstract: We introduce a new unsupervised pre-training method for reinforcement learning called $\textbf{APT}$, which stands for $\textbf{A}\text{ctive}\textbf{P}\text{re-}\textbf{T}\text{raining}$. APT learns a representation and a policy initialization by actively searching for novel states in reward-free environments. We use the contrastive learning framework for learning the representation from collected transitions. The key novel idea is to collect data during pre-training by maximizing a particle based entropy computed in the learned latent representation space. By doing particle based entropy maximization, we alleviate the need for challenging density modeling and are thus able to scale our approach to image observations. APT successfully learns meaningful representations as well as policy initializations without using any reward. We empirically evaluate APT on the Atari game suite and DMControl suite by exposing task-specific reward to agent after a long unsupervised pre-training phase. On Atari games, APT achieves human-level performance on $12$ games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch. Importantly, the pre-trained models can be fine-tuned to solve different tasks as long as the environment does not change. Finally, we also pre-train multi-environment encoders on data from multiple environments and show generalization to a broad set of RL tasks.

One-sentence Summary: We propose APT, a reward-free pre-training approach which is based on maximizing particle-based entropy in contrastive representation space for learning pre-trained models that can be leveraged for solving downstream tasks efficiently

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=FJMp6UM5m

11 Replies

Loading