Goal Achievement Guided Exploitation: A Principled Performance-Based Scheduling Framework for Reinforcement Learning

Published: 16 Mar 2026, Last Modified: 16 Mar 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In dense-reward tasks, Reinforcement learning (RL) algorithms often employ soft entropy regularization to promote exploration. By integrating an entropy term into the objective function, they regularize exploration via tuning the coefficient. However, the entropy coefficient only indirectly influences the action distribution through gradient updates, making it difficult to precisely control exploration, and requires careful scheduling to balance exploration and exploitation throughout training. As a solution, we propose Goal Achievement Guided Exploitation~(GAGE), a performance-based scheduling framework that adaptively regulates exploration by linking policy stochasticity directly to the agent's performance relative to a target value. Unlike soft entropy regularizers, GAGE enforces hard, performance-dependent constraints on action distribution's standard deviation for continuous actions and logit range for discrete actions. Consequently, GAGE ensures a guaranteed lower bound on action probabilities that naturally decays as the agent approaches optimal performance. Across a suite of challenging robotic control tasks, GAGE improves learning efficiency and stability across various strong baselines, achieving competitive or superior final performance. By providing a more interpretable and robust alternative to entropy-based exploration heuristics, GAGE offers a scalable path toward solving complex dense reward tasks with pronounced local optima.
Submission Type: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Oleg_Arenz1
Submission Number: 6676
Loading