Goal Achievement Guided Exploitation: Rethinking Maximum Entropy Reinforcement Learning

TMLR Paper6676 Authors

27 Nov 2025 (modified: 02 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Reinforcement learning (RL) algorithms often rely on entropy maximization to prevent premature convergence, yet this practice introduces fundamental drawbacks: it alters the optimization objective and cannot guarantee sufficient exploration in some tasks with local optima. We propose Goal Achievement Guided Exploitation (GAGE), a principled alternative that adaptively regulates exploration based on the agent's performance relative to the optimal goal. Instead of maximizing entropy, GAGE enforces hard lower bounds on policy flatness, represented by the standard deviation in continuous actions and the logit range in discrete ones, providing interpretable and controllable exploration without modifying the reward function. This mechanism ensures lower-bounds of action probabilities and naturally reduces stochasticity as learning progresses. Across a suite of challenging robotic control tasks with severe local optima, GAGE consistently improves stability, robustness, and final per formance over entropy-based baselines for both on-policy and off-policy algorithms by a clear margin. Our results suggest that performance-guided exploration offers a scalable and interpretable direction beyond the maximum-entropy paradigm in reinforcement learning.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Oleg_Arenz1
Submission Number: 6676
Loading