Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic GamesDownload PDF

Kaiqing Zhang, Zhuoran Yang, Tamer Basar

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: In this paper, we study the global convergence of policy optimization for solving zero-sum linear quadratic (LQ) games. In particular, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. We have shown that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective with respect to the feedback control policies constitutes the Nash equilibrium (NE) of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the games, with global sublinear rate, and local linear rate. Simulation results are then provided to validate the proposed algorithms. To the best of our knowledge, our work appears to be the first that investigates the optimization landscape of LQ games, and provably shows the convergence of policy optimization methods to the Nash equilibria. We believe the results set theoretical foundations for developing model-free policy-based reinforcement learning algorithms for zero-sum LQ games.
CMT Num: 6209
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview