Distributional Soft Actor-Critic with Adaptive Entropy Regularization

Meysam Fozi; ZAHRA GHORRATI; Ahmad Esmaeili; Mohammad Mehdi Ebadzadeh

Distributional Soft Actor-Critic with Adaptive Entropy Regularization

Meysam Fozi, ZAHRA GHORRATI, Ahmad Esmaeili, Mohammad Mehdi Ebadzadeh

Published: 03 Jun 2026, Last Modified: 03 Jun 2026ALA 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Continuous Control, Soft Actor-Critic, Distributional Soft Actor-Critic, Entropy Regularization

TL;DR: DSAC with Adaptive Entropy Regularization

Abstract: Soft Actor-Critic (SAC) and its distributional extensions achieve strong performance by combining entropy regularization with off-policy learning. However, existing automatic temperature tuning mechanisms rely on fixed target entropy formulations, entirely ignoring the rich uncertainty information captured by distributional critics. In this paper, we propose a variance-adaptive entropy regularization framework for Distributional SAC (DSAC). Our approach dynamically adjusts the entropy temperature as a function of the predicted return distribution's variance. By introducing linear and exponential adaptation schemes, we directly couple exploration strength with the uncertainty estimated by the distributional critic. Evaluated on continuous control tasks from the MuJoCo suite, our method demonstrates improved stability and generalization compared to standard SAC and DSAC-T. Ultimately, this variance-adaptive strategy mitigates overestimation and provides a more efficient solution to the exploration-exploitation dilemma in continuous reinforcement learning.

Journal Edition Interest: Yes

Submission Number: 48

Loading