Abstract: In this paper, we study the global convergence of policy optimization for solving zero-sum linear quadratic (LQ) games. In particular, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. We have shown that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective with respect to the feedback control policies constitutes the Nash equilibrium (NE) of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the games, with global sublinear rate, and local linear rate. Simulation results are then provided to validate the proposed algorithms. To the best of our knowledge, our work appears to be the first that investigates the optimization landscape of LQ games, and provably shows the convergence of policy optimization methods to the Nash equilibria. We believe the results set theoretical foundations for developing model-free policy-based reinforcement learning algorithms for zero-sum LQ games.
CMT Num: 6209
0 Replies
Loading