Keywords: Reinforcement learning, Continuous control, Global convergence
TL;DR: We derive a log-density-free Wasserstein policy gradient method for continuous control and establish its linear convergence.
Abstract: We revisit Wasserstein Proximal Policy Gradient (WPPG) for continuous control in infinite-horizon discounted reinforcement learning. By projecting the iterate of Wasserstein proximal gradient onto a parametric policy family with respect to the Wasserstein distance, we derive a new WPPG update that eliminates the need for policy densities or score functions. This makes our method directly applicable to implicit stochastic policies. We prove a linear convergence rate for the WPPG iterate under entropy regularization and a log-Sobolev condition on the policy class, for both exact and approximate value function estimates. Empirically, our algorithm is simple to implement and achieves competitive performance on standard benchmarks.
Primary Area: reinforcement learning
Submission Number: 15478
Loading