Iterated Deep $Q$-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: deep reinforcement learning, bellman operator, approximate value iteration, atari games
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A new value-based method built on top of DQN, outperforming DQN and other baselines by considering several consecutive TD-errors in the loss instead of only one.
Abstract: Value-based Reinforcement Learning (RL) methods hinge on the application of the Bellman operator, which needs to be approximated from samples. Most approaches consist of an iterative scheme alternating the application of a Bellman iteration and a subsequent projection step in the considered function space. In this paper, we propose a new perspective by introducing iterated Deep $Q$-Network (iDQN), a novel DQN-based algorithm that aims to obtain an approximation of several consecutive Bellman iterations at once. To this end, iDQN leverages the online network of DQN to build a target for a second online network, which in turn serves as a target for a third online network, and so forth, thereby taking into account future Bellman iterations. This entails that iDQN allows for better learning of the Bellman iterations than DQN, while using the same number of gradient steps. We theoretically prove the benefit of iDQN in terms of error propagation under the lens of approximate value iteration. Then, we evaluate iDQN against relevant baselines on $54$ Atari $2600$ games, showing that iDQN outperforms DQN while being orthogonal to more advanced DQN-based approaches.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1352
Loading