Distributional Reinforcement Learning via Sinkhorn Iterations

Ke Sun; Yingnan Zhao; Yi Liu; Wulong Liu; Bei Jiang; Linglong Kong

Distributional Reinforcement Learning via Sinkhorn Iterations

Ke Sun, Yingnan Zhao, Yi Liu, Wulong Liu, Bei Jiang, Linglong Kong

Published: 01 Feb 2023, Last Modified: 15 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: distributional reinforcement learning, sinkhorn divergence

TL;DR: We designed a new class of distributional RL algorithm based on Sinkhorn divergence.

Abstract: Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the entire distribution of the total return rather than only its expectation. The empirical success of distributional RL is determined by the representation of return distributions and the choice of distribution divergence. In this paper, we propose a new class of \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then uses Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Sinkhorn divergence features as the interpolation between the Wasserstein distance and Maximum Mean Discrepancy~(MMD). SinkhornDRL finds a sweet spot by taking advantage of the geometry of optimal transport-based distance and the unbiased gradient estimate property of MMD. Finally, compared to state-of-the-art algorithms, SinkhornDRL's competitive performance is demonstrated on the suit of 55 Atari games.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/distributional-reinforcement-learning-via/code)

20 Replies

Loading