A Cramér Distance perspective on Non-crossing Quantile Regression in Distributional Reinforcement LearningDownload PDF

21 May 2021 (modified: 08 Sept 2024)NeurIPS 2021 SubmittedReaders: Everyone
Keywords: Distributional Reinforcement Learning, Cramér distance, Quantile regression, Wasserstein distance, Non-crossing quantiles, Neural networks, Atari benchmark
TL;DR: We show connections between the Cramér distance, the 1-Wasserstein distance and the Quantile Regression loss, in the setting of fixed quantile levels, under non-crossing constraints and propose a novel neural architecture to guarantee them.
Abstract: Distributional reinforcement learning (DRL) extends the value-based approach by estimating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance, however, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Recently, monotonicity constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramér distance yields a projection that coincides with the 1-Wasserstein one and that, under monotonicity constraints, the squared Cramér and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a novel non-crossing neural architecture that allows a good training performance using the Cramér distance, yielding significant improvements over QR-DQN in a number of games of the standard Atari 2600 benchmark.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-cramer-distance-perspective-on-non-crossing/code)
12 Replies

Loading