Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization PerspectiveDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: We study reinforcement learning algorithms with nonlinear function approximation in the online setting. By formulating both the problems of value function estimation and policy learning as bilevel optimization problems, we propose online Q-learning and actor-critic algorithms for these two problems respectively. Our algorithms are gradient-based methods and thus are computationally efficient. Moreover, by approximating the iterates using differential equations, we establish convergence guarantees for the proposed algorithms. Thorough numerical experiments are conducted to back up our theory.
Keywords: reinforcement learning, Deep Q-networks, actor-critic algorithm, ODE approximation
15 Replies

Loading