KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies in Curiosity-driven Exploration

ICLR 2025 Conference Submission11687 Authors

27 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Curiosity-based Exploration, Sparse Reward, Soft Actor-Critic
TL;DR: KEA improves exploration in sparse reward environments by proactively coordinating exploration strategies when combining SAC with curiosity-based methods, maintaining exploration-exploitation balance, and substantially improving learning efficiency.
Abstract: In continuous control tasks, Soft Actor-Critic (SAC) has achieved notable success by balancing exploration and exploitation. However, SAC struggles in sparse reward environments, where infrequent rewards hinder efficient exploration. While novelty-based exploration methods help address this issue by encouraging the agent to explore novel states, they introduce challenges, such as the difficulty of setting an optimal reward scale and managing the interaction between novelty-based exploration and SAC’s stochastic policy. These complexities often lead to inefficient exploration or premature convergence and make balancing exploration-exploitation challenging. In this paper, we propose KEA (Keeping Exploration Alive) to tackle the inefficiencies in balancing the exploration-exploitation trade-off when combining SAC with novelty-based methods. KEA introduces an additional co-behavior agent that works alongside SAC and a switching mechanism to facilitate proactive coordination between exploration strategies from the co-behavior agent and the SAC agent with novelty-based exploration. This coordination allows the agent to maintain stochasticity in high-novelty regions, preventing premature convergence and enhancing exploration efficiency. We first analyze the difficulty of balancing exploration-exploitation when combining SAC with novelty-based methods in a 2D grid environment. We then evaluate KEA on sparse reward control tasks from the DeepMind Control Suite and compare against two state-of-the-art novelty-based exploration baselines --- Random Network Distillation (RND) and NovelD. KEA improves episodic rewards by up to 119\% over RND and 28\% over NovelD, significantly improving learning efficiency and robustness in sparse reward environments.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11687
Loading