Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Finite sample complexity analysis of the novel MF-TRPO algorithm for ergodic Mean Field Games in finite state-action spaces
Abstract: We introduce Mean Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean Field Games (MFGs) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
Lay Summary: We present a new algorithm called Mean-Field Trust Region Policy Optimization (MF-TRPO) that helps large groups of decision-makers learn how to act in a stable and competitive way. Inspired by tools from reinforcement learning, this method is designed to be reliable even when working with many agents in a data-driven manner. We also show that the method works in theory and in practice, giving clear guidelines on how much data is needed.
Link To Code: https://anonymous.4open.science/r/TRPO-MFG-0468/rebuttal/experiments.md
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Reinforcement learning, mean-field games, trust region proximal optimization, Nash equilibrium, finite sample complexity
Submission Number: 9928
Loading