Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Michael Laskin; Hao Liu; Xue Bin Peng; Denis Yarats; Aravind Rajeswaran; Pieter Abbeel

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

Published: 31 Oct 2022, Last Modified: 22 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Reinforcement Learning, Unsupervised Learning

TL;DR: Contrastive Intrinsic Control (CIC) uses a novel contrastive loss between states and skills to achieve good performance on the state-based Unsupervised RL Benchmark.

Abstract: We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

Supplementary Material: pdf

12 Replies

Loading