Reward-Free Exploration by Conditional Divergence Maximization

Hongming Li; Shujian Yu; Vincent Francois-Lavet; Jose C Principe

Reward-Free Exploration by Conditional Divergence Maximization

Hongming Li, Shujian Yu, Vincent Francois-Lavet, Jose C Principe

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Reward-free exploration, Cauchy-Schwarz divergence, intrinsic reward

TL;DR: We define intrinsic reward as the divergence between the agent's estimation of the transition probability in two adjacent trajectories, and estimate it with Cauchy-Schwarz divergence.

Abstract: We propose maximum conditional divergence (MaxCondDiv), a new curiosity-driven exploration strategy that encourages the agent to learn in the absence of external rewards, effectively separating exploration from exploitation. Our central idea is to define curiosity as the divergence between the agent's estimation of the transition probability between the next state given current state-action pairs (i.e., $\mathbb{P}(\mathbf{s}_{t+1}|\mathbf{s}_t,\mathbf{a}_t)$ ) in two adjacent trajectory fractions. Distinct to other recent intrinsically motivated exploration approaches that usually incur complex models in their learning procedures, our exploration is model-free and explicitly estimates this divergence from possibly multivariate continuous observations, thanks to the favorable properties of the Cauchy-Schwarz divergence. Therefore, MaxCondDiv is less computational complex and reduces internal model selection bias. We establish a connection between MaxCondDiv and the famed maximum entropy (MaxEnt) exploration, and observe that it achieves wider exploration range and faster convergence. Our exploration also encourages the agent to acquire intricate skills in a fully reward-free environment.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1795

Loading