Online Bellman Residue Minimization via Saddle Point Optimization

Zhuoran Yang; Cheng Zhou; Tong Zhang; Han Liu

Online Bellman Residue Minimization via Saddle Point Optimization

Zhuoran Yang, Cheng Zhou, Tong Zhang, Han Liu

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: We study the problem of Bellman residual minimization with nonlinear function approximation in general. Based on a nonconvex saddle point formulation of Bellman residual minimization via Fenchel duality, we propose an online first-order algorithm with two-timescale learning rates. Using tools from stochastic approximation, we establish the convergence of our problem by approximating the dynamics of the iterates using two ordinary differential equations. Moreover, as a byproduct, we establish a finite-time convergence result under the assumption that the dual problem can be solved up to some error. Finally, numerical experiments are provided to back up our theory.

7 Replies

Loading