Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

Kimin Lee; Michael Laskin; Aravind Srinivas; Pieter Abbeel

Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Deep reinforcement learning, ensemble learning, Q-learning

Abstract: Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from low signal and even instability in Q-learning because target values are derived from current Q-estimates, which are often noisy. To mitigate the issue, we propose ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble. We empirically observe that the proposed method stabilizes and improves learning on both continuous and discrete control benchmarks. We also specifically investigate the signal-to-noise aspect by studying environments with noisy rewards, and find that weighted Bellman backups significantly outperform standard Bellman backups. Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with UCB Exploration. By enforcing the diversity between agents using Bootstrap, we show that these different ideas are largely orthogonal and can be fruitfully integrated, together further improving the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose ensemble-based weighted Bellman backups for preventing error propagation in Q-learning, and introduce a simple unified ensemble method that handles various issues in off-policy RL algorithms.

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=gBQ-CQK7lu

16 Replies

Loading