Sinkhorn Distributional Reinforcement Learning

Ke Sun; Yingnan Zhao; Wulong Liu; Bei Jiang; Linglong Kong

Sinkhorn Distributional Reinforcement Learning

Ke Sun, Yingnan Zhao, Wulong Liu, Bei Jiang, Linglong Kong

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: distributional reinforcement learning, sinkhorn divergence

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We design a new distributional RL family via Sinkhorn divergence.

Abstract: The empirical success of distributional reinforcement learning~(RL) highly depends on the representation of return distributions and the choice of distribution divergence. In this paper, we propose \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns unrestricted statistics, i.e., deterministic samples, from each return distribution and then leverages Sinkhorn divergence to minimize the difference between current and target Bellman return distributions. Theoretically, we prove the convergence properties of SinkhornDRL in the tabular setting, which is consistent with the interpolation nature of Sinkhorn divergence between Wasserstein distance and Maximum Mean Discrepancy~(MMD). We also establish a new equivalent form of Sinkhorn divergence with a regularized MMD beyond the optimal transport literature, contributing to interpreting the superiority of SinkhornDRL over existing distributional RL methods. Empirically, we show that SinkhornDRL is consistently better or comparable to existing algorithms on the suite of 55 Atari games.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6372

Loading