State-Constrained Offline Reinforcement Learning

Charles Alexander Hepburn; Yue Jin; Giovanni Montana

State-Constrained Offline Reinforcement Learning

Charles Alexander Hepburn, Yue Jin, Giovanni Montana

Published: 10 Jun 2025, Last Modified: 10 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the policy to seen actions. In this paper, we alleviate this limitation by introducing state-constrained offline RL, a novel framework that focuses solely on the dataset’s state distribution. This approach allows the policy to take high-quality out-of-distribution actions that lead to in- distribution states, significantly enhancing learning potential. The proposed setting not only broadens the learning horizon but also improves the ability to combine different trajectories from the dataset effectively, a desirable property inherent in offline RL. Our research is underpinned by theoretical findings that pave the way for subsequent advancements in this area. Additionally, we introduce StaCQ, a deep learning algorithm that achieves state-of-the-art performance on the D4RL benchmark datasets and aligns with our theoretical propositions. StaCQ establishes a strong baseline for forthcoming explorations in this domain.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/CharlesHepburn1/State-Constrained-Offline-Reinforcement-Learning

Assigned Action Editor: ~Steven_Stenberg_Hansen1

Submission Number: 3423

Loading