ConQUR: Mitigating Delusional Bias in Deep Q-Learning

DiJia-Andy Su; Jayden Ooi; Tyler Lu; Dale Schuurmans; Craig Boutilier‎

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

DiJia-Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier‎

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, q-learning, deep reinforcement learning, Atari

TL;DR: We developed a search framework and consistency penalty to mitigate delusional bias.

Abstract: Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/conqur-mitigating-delusional-bias-in-deep-q/code)

Original Pdf: pdf

8 Replies

Loading