KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: KEA improves exploration efficiency in sparse reward environments by proactively coordinating exploration strategies between SAC’s stochastic policy and novelty-based exploration methods.
Abstract: Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC’s stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA integrates a novelty-augmented SAC with a standard SAC agent, proactively coordinated via a switching mechanism. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task, and then evaluate KEA on the DeepSea hard-exploration benchmark as well as sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-based exploration baselines, our experiments show that KEA significantly improves learning efficiency and robustness in sparse reward setups.
Lay Summary: Many AI agents learn through trial and error, using rewards to guide their behavior. But when rewards are rare, they often wander aimlessly and learn slowly. A popular fix is to reward the agent for experiencing new situations, but simply adding a "novelty bonus" can lead to repetitive behavior and wasted effort. Enter **KEA**, a simple yet powerful conductor of exploration. For **familiar experiences**, it reinforces effective behaviors to encourage the agent to **revisit and refine promising actions**. In **unfamiliar zones**, it deploys a stochastic explorer to **make bold, random moves**. KEA’s switching mechanism coordinates between these two agents: the novelty-based learner polishes “fresh” yet already-seen experiences, then shifts control to the standard stochastic learner to execute bold new actions whenever true novelty arises. Tested on challenging exploration tasks, KEA learns faster and performs better than state-of-the-art novelty methods. By keeping exploration alive, KEA could accelerate progress in robotics, autonomous vehicles, and **any system that must make sense of complex environments with few rewards**.
Link To Code: https://github.com/shihminyang/KEA
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Novelty-based Exploration, Soft Actor-Critic, Sparse reward
Submission Number: 11052
Loading