Abstract: We study the problem of Bellman residual minimization with nonlinear function approximation in general.
Based on a nonconvex saddle point formulation of Bellman residual minimization via Fenchel duality, we propose an online first-order algorithm with two-timescale learning rates. Using tools from stochastic approximation, we establish the convergence of our problem by approximating the dynamics of the iterates using two ordinary differential equations. Moreover, as a byproduct, we establish a finite-time convergence result under the assumption that the dual problem can be solved up to some error. Finally, numerical experiments are provided to back up our theory.
7 Replies
Loading