Optimizing Q-Learning Using Expectile Regression: A Dual Approach to Handle In-Sample and Out-of-Sample Data

Caroline Chen; Yuwei Fu

Optimizing Q-Learning Using Expectile Regression: A Dual Approach to Handle In-Sample and Out-of-Sample Data

Caroline Chen, Yuwei Fu

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning

Abstract: Offline Reinforcement Learning (RL) presents unique challenges, primarily due to the constraint of learning from a static dataset without additional environmental interaction. Traditional methods often face limitations in effectively exploiting the available data, particularly when navigating the exploration-exploitation trade-off inherent in RL. This paper introduces a novel algorithm inspired by Implicit Q-Learning, designed to extend the utility of the Bellman update to actions not explicitly present in the dataset. Our approach, termed Extended Implicit Q-Learning (EIQL), strategically incorporates actions beyond the dataset constraints by allowing selection actions with maximum Q. By doing so, it leverages the maximization capability of the Bellman update, while simultaneously mitigating error extrapolation risks. We demonstrate the efficacy of EIQL through a series of experiments that show its improved performance over traditional offline RL algorithms, particularly in environments characterized by sparse rewards or those containing suboptimal and incomplete trajectories. Our results suggest that EIQL enhances the potential of offline RL by utilizing a broader action spectrum.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10189

Loading