Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Exploration in Reinforcement Learning towards Higher Moment Regularisations

Published: 12 Jun 2025, Last Modified: 30 Jun 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Theory
Keywords: distributional reinforcement learning, uncertainty, exploration, regularization
TL;DR: We analyze the benefits of using categorical distributional loss in distributional RL via distribution decomposition, providing an explanation from the perspective of exploration.
Abstract: The remarkable empirical performance of distributional reinforcement learning (RL) has garnered increasing attention to understanding its theoretical advantages over classical RL. By decomposing the categorical distributional loss commonly employed in distributional RL, we find that the potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization that captures higher moment knowledge. This less-studied entropy regularization aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the vanilla entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the novel entropy regularization derived from categorical distributional loss implicitly updates policies to align the learned policy with (estimated) environmental uncertainty. Finally, extensive experiments verify the significance of this uncertainty-aware regularization from distributional RL on the empirical benefits over classical RL. Our study offers an innovative exploration perspective to explain the intrinsic benefits of distributional learning in RL.
Serve As Reviewer: ~Ke_Sun6, ~Yingnan_Zhao1
Submission Number: 28
Loading