When Distance Matters: Wasserstein Trust Regions for Multi-Agent Coordination

Published: 17 Dec 2025, Last Modified: 17 Dec 2025WoMAPF OralEveryoneRevisionsCC BY 4.0
Keywords: Multi-agent reinforcement learning, Trust region optimization, Wasserstein distance, Optimal transport
TL;DR: This paper proposes W-MATRPO, which replaces KL-divergence with Wasserstein-1 distance for trust region constraints in multi-agent reinforcement learning and solves it via a tractable dual formulation.
Abstract: This paper presents a new trust region optimization approach for cooperative multi-agent reinforcement learning through the incorporation of optimal transport. We replace traditional KL-divergence constraints with the Wasserstein-1 distance to define trust regions, using a dual formulation to transform the constrained optimization problem into a tractable problem over single non-negative dual variables per agent. We also introduce a coordination-aware adaptive trust-region (CAATR) mechanism, adjusting each agent's trust-region radius inversely proportional to teammate policy drift. The resulting Wasserstein multi-agent trust-region policy optimization (W-MATRPO) algorithm provides surrogate objective bounds through sequential optimization. Theoretical analysis establishes performance bounds for the multi-agent setting, and experimental analysis demonstrates improved exploration in an environment with local optima traps.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 9
Loading