DEXR: A Unified Approach Towards Environment Agnostic Exploration

Yiran Wang; Yunfan Li; Sanae Amani; Lin Yang

DEXR: A Unified Approach Towards Environment Agnostic Exploration

Yiran Wang, Yunfan Li, Sanae Amani, Lin Yang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement Learning, Exploration, Intrinsic Rewards

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel framework DEXR for enhancing exploration algorithms across different types of environments, by preventing over-exploratiton and optimization instability.

Abstract: The exploration-exploitation dilemma poses pivotal challenges in reinforcement learning (RL). While recent advances in curiosity-driven techniques have demonstrated capabilities in sparse reward scenarios, they necessitate extensive hyperparameter tuning on different types of environments and often fall short in dense reward settings. In response to these challenges, we introduce the novel \textbf{D}elayed \textbf{EX}ploration \textbf{R}einforcement Learning (DEXR) framework. DEXR adeptly curbs over-exploration and optimization instabilities issues of curiosity-driven methods, and can efficiently adapt to both dense and sparse reward environments with minimal hyperparameter tuning. This is facilitated by an auxiliary exploitation-only policy that streamlines data collection, guiding the exploration policy towards high-value regions and minimizing unnecessary exploration. Additionally, this exploration policy yields diverse, in-distribution data, and bolsters training robustness with neural network structures. We verify the efficacy of DEXR with both theoretical validations and comprehensive empirical evaluations, demonstrating its superiority in a broad range of environments.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8228

Loading