Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Piotr Kozakowski; Lukasz Kaiser; Henryk Michalewski; Afroz Mohiuddin; Katarzyna Kańska

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Piotr Kozakowski, Lukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kańska

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, rl, offline rl, continuous control, atari, sample efficiency

Abstract: Sample efficiency and performance in the offline setting have emerged as among the main challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but struggles on tasks with discrete actions and in sample efficiency. We perform a theoretical analysis of AWR that explains its shortcomings and use the insights to motivate QWR theoretically. We show experimentally that QWR matches state-of-the-art algorithms both on tasks with continuous and discrete actions. We study the main hyperparameters of QWR and find that it is stable in a wide range of their choices and on different tasks. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters -- yields results on par with a highly tuned Rainbow implementation on a set of Atari games. We also verify that QWR performs well in the offline RL setting, making it a compelling choice for reinforcement learning in domains with limited data.

One-sentence Summary: We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/q-value-weighted-regression-reinforcement/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=oJjJOoFGhu

13 Replies

Loading