Iterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

Théo Vincent; Boris Belousov; Carlo D'Eramo; Jan Peters

Iterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

Théo Vincent, Boris Belousov, Carlo D'Eramo, Jan Peters

10 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX

Keywords: deep reinforcement learning, bellman operator, approximate value iteration, atari games

TL;DR: A new value-based method built on top of DQN, outperforming DQN and other baselines by considering the future Bellman iterations in the loss.

Abstract: Value-based reinforcement learning~(RL) methods strive to obtain accurate approximations of optimal action-value functions. Notoriously, these methods heavily rely on the application of the optimal Bellman operator, which needs to be approximated from samples.Most approaches consider only a single Bellman iteration, which limits their power. In this paper, we introduce iterated Deep Q-Network (iDQN), a new DQN-based algorithm that incorporates several consecutive Bellman iterations into the training loss. iDQN leverages the online network of DQN to build a target for a second online network, which in turn serves as a target for a third online network, etc., thereby taking into account future Bellman iterations. While using the same number of gradient steps, iDQN allows for better learning of the Bellman iterations compared to DQN. We evaluate iDQN against relevant baselines on 54 Atari 2600 games to showcase its benefit in terms of approximation error and performance. iDQN greatly outperforms its closest baselines, DQN and Random Ensemble Mixture, while being orthogonal to more advanced DQN-based approaches.

Supplementary Material: zip

Submission Number: 6837

Loading