Keywords: bandits, reinforcement learning
TL;DR: While learning the value functions through stochastic approximation based methods like Q-learning, one can use the estimates learnt from the environment to drive value iteration style updates to improve Q-learning.
Abstract: In reinforcement learning, model free methods such as Q-learning and policy gradient are extremely popular due to their simplicity but require a huge amount of data for training. Model based methods on the other hand, are proven to be sample efficient in various environments but are unfortunately computationally expensive. It is therefore only prudent to investigate and design algorithms that have best of features from both these classes of algorithms. In this work, we propose MAQL, a model-assisted Q-learning algorithm that is not only computationally inexpensive but also offers low sample complexity. We illustrate its superior performance to vanilla Q-learning in various RL environments and particularly demonstrate its utility in learning the Gittins/Whittles index in Rested/Restless Bandits respectively. We aim to spur discussion on how model-assists can help boost the performance of existing RL algorithms.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6887
Loading