Theoretical Foundations of Wasserstein Policy Optimization

15 Sept 2025 (modified: 06 Dec 2025)Agents4Science 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Wasserstein Policy Optimization (WPO), Optimal Transport (W2) Gradient Flows, Natural Gradient and Fisher Geometry, c-Wasserstein Stability (Convex Conjugates), Gaussian Policy Covariance Updates
Abstract: We revisit Wasserstein Policy Optimization (WPO) as policy transport on action densities followed by a projection onto a parametric manifold. Evolving policies by a 2-Wasserstein gradient flow and projecting in the Fisher/KL inner product yields a covariant natural step with a mixed-derivative cross-term. We make this projection-based view explicit, prove baseline invariance (via a constrained G\'ateaux variation) and parameterization covariance (via Fisher pullbacks), and delineate when the step coincides with natural policy gradient (affine-in-action exponential families) versus when it departs (mixtures, squashings). For Gaussian policies we give mean and covariance updates, including a full-covariance Cholesky implementation that preserves SPD. We extend to c-Wasserstein dynamics to obtain principled stability via convex conjugates and state precise energy inequalities in the frozen-critic regime. Assumptions and weak-form conditions are spelled out, and connections to classic PG/DPG/NPG are established.
Supplementary Material: zip
Submission Number: 240
Loading