CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning

CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning

TMLR Paper6255 Authors

20 Oct 2025 (modified: 21 Jan 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Exploration remains a fundamental challenge in reinforcement learning, as many existing methods either lack theoretical guarantees or fall short in practical effectiveness. In this paper, we propose CAE, i.e., the Critic as an Explorer, a lightweight approach that repurposes the value networks in standard deep RL algorithms to drive exploration, without introducing additional parameters. CAE leverages multi-armed bandit techniques combined with a tailored scaling strategy, enabling efficient exploration with provable sub-linear regret bounds and strong empirical stability. Remarkably, it is simple to implement, requiring only about 10 lines of code. For complex tasks where learning reliable value networks is difficult, we introduce CAE+, an extension of CAE that incorporates an auxiliary network. CAE+ increases the parameter count by less than 1% while preserving implementation simplicity, adding roughly 10 additional lines of code. Extensive experiments on MuJoCo, MiniHack, and Habitat validate the effectiveness of CAE and CAE+, highlighting their ability to unify theoretical rigor with practical efficiency.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Marcello_Restelli1

Submission Number: 6255

Loading