Reward-Free Exploration by Conditional Divergence Maximization

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Reward-free exploration, Cauchy-Schwarz divergence, intrinsic reward
TL;DR: We define intrinsic reward as the divergence between the agent's estimation of the transition probability in two adjacent trajectories, and estimate it with Cauchy-Schwarz divergence.
Abstract: We propose maximum conditional divergence (MaxCondDiv), a new curiosity-driven exploration strategy that encourages the agent to learn in the absence of external rewards, effectively separating exploration from exploitation. Our central idea is to define curiosity as the divergence between the agent's estimation of the transition probability between the next state given current state-action pairs (i.e., $\mathbb{P}(\mathbf{s}_{t+1}|\mathbf{s}_t,\mathbf{a}_t)$ ) in two adjacent trajectory fractions. Distinct to other recent intrinsically motivated exploration approaches that usually incur complex models in their learning procedures, our exploration is model-free and explicitly estimates this divergence from possibly multivariate continuous observations, thanks to the favorable properties of the Cauchy-Schwarz divergence. Therefore, MaxCondDiv is less computational complex and reduces internal model selection bias. We establish a connection between MaxCondDiv and the famed maximum entropy (MaxEnt) exploration, and observe that it achieves wider exploration range and faster convergence. Our exploration also encourages the agent to acquire intricate skills in a fully reward-free environment.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1795
Loading