Wasserstein Policy Gradient: Implicit Policies, Entropy Regularization and Linear Convergence

Zhaoyu Zhu; Shuhan Zhang; Rui Gao; Shuang Li

Wasserstein Policy Gradient: Implicit Policies, Entropy Regularization and Linear Convergence

Zhaoyu Zhu, Shuhan Zhang, Rui Gao, Shuang Li

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Continuous control, Global convergence

TL;DR: We derive a log-density-free Wasserstein policy gradient method for continuous control and establish its linear convergence.

Abstract: We revisit Wasserstein Proximal Policy Gradient (WPPG) for continuous control in infinite-horizon discounted reinforcement learning. By projecting the iterate of Wasserstein proximal gradient onto a parametric policy family with respect to the Wasserstein distance, we derive a new WPPG update that eliminates the need for policy densities or score functions. This makes our method directly applicable to implicit stochastic policies. We prove a linear convergence rate for the WPPG iterate under entropy regularization and a log-Sobolev condition on the policy class, for both exact and approximate value function estimates. Empirically, our algorithm is simple to implement and achieves competitive performance on standard benchmarks.

Primary Area: reinforcement learning

Submission Number: 15478

Loading