Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic PerformanceDownload PDF

Published: 28 Jan 2022, Last Modified: 04 May 2025ICLR 2022 SubmittedReaders: Everyone
Keywords: deep reinforcement learning, off-policy, model-free, sample efficiency, ensembles
Abstract: Recently, Truncated Quantile Critics (TQC), using distributional representation of critics, was shown to provide state-of-the-art asymptotic training performance on all environments from the MuJoCo continuous control benchmark suite. Also recently, Randomized Ensemble Double Q-Learning (REDQ), using a high update-to-data ratio and target randomization, was shown to achieve high sample efficiency that is competitive with state-of-the-art model-based methods. In this paper, we propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the asymptotic performance of TQC, thereby providing overall state-of-the-art performance during all stages of training. Moreover, AQE is very simple, requiring neither distributional representation of critics nor target randomization.
One-sentence Summary: We propose a simple model-free algorithm with ensembles that achieves both high sample efficiency and state-of-the-art asymptotic performance.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/aggressive-q-learning-with-ensembles/code)
12 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview