Abstract: Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have prepared a revised manuscript with all requested modifications,
in terms of improved presentation, eg a clearer abstract, adding details
in several places and graphical illustrations (Figures 1 and 2). We have
also added additional experiments and link to the code. We have expanded
our experimental evaluation using the representative subset of Atari
games introduced in Aitchison et al 2023. Along with the performance of
the distributional RL algorithms (Table 2, Figure 3), we report
computational overhead (Table 5 and 6) and conduct an ablation study regarding
the number of mixture components K (Table 4 and Figure 5).
Code: https://gitlab.inria.fr/mantonet/gm-drl
Supplementary Material: zip
Assigned Action Editor: ~Dileep_Kalathil1
Submission Number: 5040
Loading