Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Xinyue Chen; Che Wang; Zijian Zhou; Keith W. Ross

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Xinyue Chen, Che Wang, Zijian Zhou, Keith W. Ross

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: Artificial Integlligence, Machine Learning, Deep Reinforcement Learning

Abstract: Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio $\gg 1$; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio $\gg 1$.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose and analyze a novel model-free algorithm that achieves strong performance with a high update-to-data ratio.

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) watchernyu/REDQ](https://github.com/watchernyu/REDQ) + [![Papers with Code](/images/pwc_icon.svg) 5 community implementations](https://paperswithcode.com/paper/?openreview=AY8zfZm0tDd)

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/randomized-ensembled-double-q-learning/code)

11 Replies

Loading