On Optimization Formulations of Finite Horizon MDPs

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX
Keywords: finite horizon mdp, optimization, KKT conditions, non-convex, quasi-newton, policy gradient, backward induction
TL;DR: We study equivalences between optimization formulations for finite horizon MDPs and show sufficiency of KKT conditions for the non-convex policy optimization formulation.
Abstract: In this paper, we extend the connection between linear programming formulations of MDPs and policy gradient methods for infinite horizon MDPs presented in (Ying, L., & Zhu, Y., 2020) to finite horizon MDPs. The main tool we use for this extension is a reduction from optimization formulations of finite horizon MDPs to infinite horizon MDPs. Additionally, we show using a reparameterization argument that the KKT conditions for the non-convex policy optimization problem for the finite horizon setting are sufficient for global optimality. Further, we use the reduction to extend the Quasi-Newton policy gradient algorithm of (Li et. al 2021) to the finite horizon case and achieve performance competitive with value iteration by exploiting backward induction for policy evaluation. To our knowledge, this serves as the first policy gradient-based method for finite horizon MDPs that is competitive with value iteration-based approaches.
Submission Number: 23