CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

Michael Laskin; Hao Liu; Xue Bin Peng; Denis Yarats; Aravind Rajeswaran; Pieter Abbeel

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: unsupervised learning, reinforcement learning, exploration

TL;DR: We introduce Contrastive Intrinsic Control (CIC) - a new unsupervised skill discovery algorithm that achieves leading performance on the Unsupervised Reinforcement Learning Benchmark

Abstract: We introduce Contrastive Intrinsic Control (CIC) - an algorithm for unsupervised skill discovery that maximizes the mutual information between skills and state transitions. In contrast to most prior approaches, CIC uses a decomposition of the mutual information that explicitly incentivizes diverse behaviors by maximizing state entropy. We derive a novel lower bound estimate for the mutual information which combines a particle estimator for state entropy to generate diverse behaviors and contrastive learning to distill these behaviors into distinct skills. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves on prior unsupervised skill discovery methods by $91\%$ and the next-leading overall exploration algorithm by $26\%$ in terms of downstream task performance.

Supplementary Material: zip

0 Replies

Loading