Iterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

Théo Vincent; Boris Belousov; Carlo D'Eramo; Jan Peters

Iterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

Théo Vincent, Boris Belousov, Carlo D'Eramo, Jan Peters

Published: 20 Jul 2023, Last Modified: 31 Aug 2023EWRL16Readers: Everyone

Keywords: deep reinforcement learning, bellman operator, approximate value iteration, atari games

TL;DR: A new value-based method built on top of DQN, outperforming DQN and other baselines by considering the future Bellman iterations in the loss.

Abstract: Value-based reinforcement learning methods strive to obtain accurate approximations of optimal action-value functions. Notoriously, these methods heavily rely on the application of the optimal Bellman operator, which needs to be approximated from samples. Most approaches consider only a single Bellman iteration, which limits their power. In this paper, we introduce iterated Deep Q-Network (iDQN), a new DQN-based algorithm that incorporates several consecutive Bellman iterations into the training loss. iDQN leverages the online network of DQN to build a target for a second online network, which in turn serves as a target for a third online network, and so forth, thereby taking into account future Bellman iterations. While using the same number of gradient steps, iDQN allows for better learning of the Bellman iterations than DQN. After providing some theoretical guarantees, we evaluate iDQN against relevant baselines on $54$ Atari $2600$ games to showcase its benefit in terms of approximation error and performance. iDQN outperforms DQN while being orthogonal to more advanced DQN-based approaches.

1 Reply

Loading