Q-Value Weighted Regression: Reinforcement Learning with Limited DataDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: reinforcement learning, rl, offline rl, continuous control, atari, sample efficiency
Abstract: Sample efficiency and performance in the offline setting have emerged as among the main challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but struggles on tasks with discrete actions and in sample efficiency. We perform a theoretical analysis of AWR that explains its shortcomings and use the insights to motivate QWR theoretically. We show experimentally that QWR matches state-of-the-art algorithms both on tasks with continuous and discrete actions. We study the main hyperparameters of QWR and find that it is stable in a wide range of their choices and on different tasks. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters -- yields results on par with a highly tuned Rainbow implementation on a set of Atari games. We also verify that QWR performs well in the offline RL setting, making it a compelling choice for reinforcement learning in domains with limited data.
One-sentence Summary: We analyze the sample-efficiency of actor-critic RL algorithms, and introduce a new algorithm, achieving superior sample-efficiency while maintaining competitive final performance on the MuJoCo task suite and on Atari games.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2102.06782/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=oJjJOoFGhu
13 Replies

Loading