An analysis of distributional reinforcement learning with Gaussian mixtures

TMLR Paper5040 Authors

05 Jun 2025 (modified: 08 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Dileep_Kalathil1
Submission Number: 5040
Loading