The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning
Keywords: distributional learning, reinforcement learning, exploration
Abstract: Despite the remarkable empirical performance of distributional reinforcement learning (RL), its theoretical advantages over classical RL are not fully understood. Starting with Categorical Distributional RL (CDRL), we propose that the potential superiority of distributional RL can be attributed to a derived distribution-matching regularization by applying a return density function decomposition technique. This less-studied regularization in the distributional RL context aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the standard entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the regularization derived from CDRL implicitly updates policies to align the learned policy with environmental uncertainty. Finally, extensive experiments substantiate the significance of this uncertainty-aware regularization derived from distributional RL on the empirical benefits over classical RL. Our study offers a new perspective from the exploration to explain the benefits of adopting distributional learning in RL.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6025
Loading