Keywords: Reinforcement Learning, Continuous Control, Soft Actor-Critic, Distributional Soft Actor-Critic, Entropy Regularization
TL;DR: DSAC with Adaptive Entropy Regularization
Abstract: Soft Actor-Critic (SAC) and its distributional extensions achieve strong performance by combining entropy regularization with off-policy learning. However, existing automatic temperature tuning mechanisms rely on fixed target entropy formulations, entirely ignoring the rich uncertainty information captured by distributional critics. In this paper, we propose a variance-adaptive entropy regularization framework for Distributional SAC (DSAC). Our approach dynamically adjusts the entropy temperature as a function of the predicted return distribution's variance. By introducing linear and exponential adaptation schemes, we directly couple exploration strength with the uncertainty estimated by the distributional critic. Evaluated on continuous control tasks from the MuJoCo suite, our method demonstrates improved stability and generalization compared to standard SAC and DSAC-T. Ultimately, this variance-adaptive strategy mitigates overestimation and provides a more efficient solution to the exploration-exploitation dilemma in continuous reinforcement learning.
Journal Edition Interest: Yes
Submission Number: 48
Loading