Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Piotr Kozakowski, Lukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kanska

Published: 2022, Last Modified: 05 May 2023IJCNN 2022Readers: Everyone

Abstract: Sample efficiency has emerged as a significant challenge of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in this aspect. QWR builds upon Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform both theoretical and empirical analyses of AWR, that explain its shortcomings and use these insights to motivate QWR. We show experimentally that QWR either outperforms or matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - outperforms a highly tuned implementation of Rainbow on a set of Atari games. At the same time, QWR is a much simpler algorithm than both SAC and Rainbow.

0 Replies