Accelerated Value Iteration via Anderson Mixing

Yujun Li; Chengzhuo Ni; Guangzeng Xie; Wenhao Yang; Shuchang Zhou; Zhihua Zhang

Accelerated Value Iteration via Anderson Mixing

Yujun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Acceleration for reinforcement learning methods is an important and challenging theme. We introduce the Anderson acceleration technique into the value iteration, developing an accelerated value iteration algorithm that we call Anderson Accelerated Value Iteration (A2VI). We further apply our method to the Deep Q-learning algorithm, resulting in the Deep Anderson Accelerated Q-learning (DA2Q) algorithm. Our approach can be viewed as an approximation of the policy evaluation by interpolating on historical data. A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation. We give a theoretical analysis of our algorithm and conduct experiments on both toy problems and Atari games. Both the theoretical and empirical results show the effectiveness of our algorithm.

Keywords: Reinforcement Learning

12 Replies

Loading