Unsupervised Reinforcement Learning by Maximizing Skill Density Deviation

Jiakun Zheng; Ting Xiao; Rushuai Yang; Kang Xu; Qiaosheng Zhang; Peng Liu; Chenjia Bai

Unsupervised Reinforcement Learning by Maximizing Skill Density Deviation

Jiakun Zheng, Ting Xiao, Rushuai Yang, Kang Xu, Qiaosheng Zhang, Peng Liu, Chenjia Bai

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unsupervised Reinforcement Learning, Skill Discovery, Inter-Skill Diversity, Intra-Skill Exploration

TL;DR: We propose SD3 to encourage inter-skill diversity and intra-skill exploration in a unified framework.

Abstract: Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks. Previous methods typically focus on entropy-based exploration or empowerment-driven skill learning. However, entropy-based exploration struggles in large-scale state spaces (e.g., images), and empowerment-based methods with Mutual Information (MI) estimations have limitations in state exploration. To address these challenges, we propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills, encouraging inter-skill state diversity similar to the initial MI objective. For state-density estimation, we construct a novel conditional autoencoder with soft modularization for different skill policies in high-dimensional space. To incentivize intra-skill exploration, we formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space. Through extensive experiments in challenging state and image-based tasks, we find our method learns meaningful skills and achieves superior performance in various downstream tasks.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8761

Loading