Distributional Reinforcement Learning via Sinkhorn Iterations

Ke Sun; Yingnan Zhao; Yi Liu; Bei Jiang; Linglong Kong

Distributional Reinforcement Learning via Sinkhorn Iterations

Ke Sun, Yingnan Zhao, Yi Liu, Bei Jiang, Linglong Kong

16 May 2022 (modified: 06 Apr 2025)NeurIPS 2022 SubmittedReaders: Everyone

Keywords: distributional reinforcement learning, sinkhorn divergence

TL;DR: We designed a new class of distributional RL algorithm based on Sinkhorn divergence.

Abstract: Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. The representation manner of each return distribution and the choice of distribution divergence are pivotal for the empirical success of distributional RL. In this paper, we propose a new class of \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then leverages Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Remarkably, Sinkhorn divergence interpolates between the Wasserstein distance and Maximum Mean Discrepancy~(MMD). This allows our proposed SinkhornDRL algorithm to find a sweet spot leveraging the geometry of optimal transport-based distance and the unbiased gradient estimates of MMD. Finally, experiments on the suit of 55 Atari games reveal the competitive performance of SinkhornDRL algorithm as opposed to existing state-of-the-art algorithms.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/distributional-reinforcement-learning-via/code)

20 Replies

Loading