Preventing Value Function Collapse in Ensemble  Q-Learning by Maximizing Representation Diversity

Hassam Sheikh; Ladislau Boloni

Preventing Value Function Collapse in Ensemble Q-Learning by Maximizing Representation Diversity

Hassam Sheikh, Ladislau Boloni

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Ensemble Q-Learning, Representation Diversity, Reinforcement Learning

Abstract: The first deep RL algorithm, DQN, was limited by the overestimation bias of the learned Q-function. Subsequent algorithms proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms used the different estimates provided by ensembles of learners to reduce the bias. Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to maximize diversity in the representation space in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the resulting approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: A regularization technique to maximize representation diversity in ensemble based Q-learning methods.

Reviewed Version (pdf): https://openreview.net/references/pdf?id=tzRHgm8Sc6

7 Replies

Loading