Keywords: distributional reinforcement learning, double Q learning, adaptive learning, DQN, Actor-Critic
Abstract: Bias problems in the estimation of maxima of random variables are a well-known obstacle that drastically slows down $Q$-learning algorithms. We propose to use additional insight gained from distributional reinforcement learning to deal with the overestimation in a locally adaptive way. This helps to combine the strengths and weaknesses of the different $Q$-learning variants in a unified framework. Our framework ADDQ is simple to implement, existing RL algorithms can be improved with a few lines of additional code. We provide experimental results in tabular, Atari, and MuJoCo environments for discrete and continuous control problems, comparisons with state-of-the-art methods, and a proof of convergence.
Submission Number: 68
Loading