CAN ALTQ LEARN FASTER: EXPERIMENTS AND THEORY

Bowen Weng; Huaqing Xiong; Yingbin Liang; Wei Zhang

CAN ALTQ LEARN FASTER: EXPERIMENTS AND THEORY

Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei Zhang

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Q-Learning, Adam, Restart, Convergence Analysis

TL;DR: New Experiments and Theory for Adam Based Q-Learning

Abstract: Differently from the popular Deep Q-Network (DQN) learning, Alternating Q-learning (AltQ) does not fully fit a target Q-function at each iteration, and is generally known to be unstable and inefficient. Limited applications of AltQ mostly rely on substantially altering the algorithm architecture in order to improve its performance. Although Adam appears to be a natural solution, its performance in AltQ has rarely been studied before. In this paper, we first provide a solid exploration on how well AltQ performs with Adam. We then take a further step to improve the implementation by adopting the technique of parameter restart. More specifically, the proposed algorithms are tested on a batch of Atari 2600 games and exhibit superior performance than the DQN learning method. The convergence rate of the slightly modified version of the proposed algorithms is characterized under the linear function approximation. To the best of our knowledge, this is the first theoretical study on the Adam-type algorithms in Q-learning.

Original Pdf: pdf

7 Replies

Loading