Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

Yanqiu Wu; Xinyue Chen; Che Wang; Yiming Zhang; Zijian Zhou; Keith W. Ross

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

Yanqiu Wu, Xinyue Chen, Che Wang, Yiming Zhang, Zijian Zhou, Keith W. Ross

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: deep reinforcement learning, off-policy, model-free, sample efficiency, ensembles

Abstract: Recently, Truncated Quantile Critics (TQC), using distributional representation of critics, was shown to provide state-of-the-art asymptotic training performance on all environments from the MuJoCo continuous control benchmark suite. Also recently, Randomized Ensemble Double Q-Learning (REDQ), using a high update-to-data ratio and target randomization, was shown to achieve high sample efficiency that is competitive with state-of-the-art model-based methods. In this paper, we propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the asymptotic performance of TQC, thereby providing overall state-of-the-art performance during all stages of training. Moreover, AQE is very simple, requiring neither distributional representation of critics nor target randomization.

One-sentence Summary: We propose a simple model-free algorithm with ensembles that achieves both high sample efficiency and state-of-the-art asymptotic performance.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/aggressive-q-learning-with-ensembles/code)

12 Replies

Loading