An analysis of distributional reinforcement learning with Gaussian mixtures

Mathis Antonetti; Henrique Donancio; Florence Forbes

An analysis of distributional reinforcement learning with Gaussian mixtures

Mathis Antonetti, Henrique Donancio, Florence Forbes

Published: 07 Jan 2026, Last Modified: 07 Jan 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.

Submission Type: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We have prepared a revised manuscript with all requested modifications, in terms of improved presentation, eg a clearer abstract, adding details in several places and graphical illustrations (Figures 1 and 2). We have also added additional experiments and link to the code. We have expanded our experimental evaluation using the representative subset of Atari games introduced in Aitchison et al 2023. Along with the performance of the distributional RL algorithms (Table 2, Figure 3), we report computational overhead (Table 5 and 6) and conduct an ablation study regarding the number of mixture components K (Table 4 and Figure 5).

Code: https://gitlab.inria.fr/mantonet/gm-drl

Supplementary Material: zip

Assigned Action Editor: ~Dileep_Kalathil1

Submission Number: 5040

Loading