Online Bellman Residue Minimization via Saddle Point OptimizationDownload PDF

27 Sep 2018 (modified: 05 Dec 2018)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone
  • Abstract: We study the problem of Bellman residual minimization with nonlinear function approximation in general. Based on a nonconvex saddle point formulation of Bellman residual minimization via Fenchel duality, we propose an online first-order algorithm with two-timescale learning rates. Using tools from stochastic approximation, we establish the convergence of our problem by approximating the dynamics of the iterates using two ordinary differential equations. Moreover, as a byproduct, we establish a finite-time convergence result under the assumption that the dual problem can be solved up to some error. Finally, numerical experiments are provided to back up our theory.
7 Replies