[Proposal-ML] DCRL: Dataset-Constrained Reinforcement Learning for Safe In-Distribution Exploration

Ziang Zheng

[Proposal-ML] DCRL: Dataset-Constrained Reinforcement Learning for Safe In-Distribution Exploration

Ziang Zheng

27 Oct 2024 (modified: 05 Nov 2024)THU 2024 Fall AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Offline Reinforcement Learning, Safe Reinforcement Learning, OOD

Abstract: In offline reinforcement learning (RL), addressing out-of-distribution (OOD) actions is essential for safe policy learning, as such actions often lead to overestimated values and risky behaviors. Existing methods primarily tackle this issue through regularization or counterfactual reasoning but often lack a principled approach to guarantee safe exploration within dataset constraints. This paper presents a novel approach that incorporates safe RL theory into offline RL by introducing the Dataset Feasibility Function (DFF), enabling policy learning that respects dataset boundaries while managing OOD risks. Our proposed Dataset-Constrained Reinforcement Learning (DCRL) framework employs two mechanisms: Dataset Feasibility Guidance (DFG), which serves as a regularization term to keep the policy aligned with the dataset distribution, and Dataset Feasibility Indication (DFI), which acts as an OOD detection tool. DFI enables safe out-of-distribution exploration by leveraging model rollouts constrained within feasible zones identified by a larger tolerance threshold. This approach uniquely blends safety constraints with both regularization and counterfactual reasoning to advance performance and robustness in offline RL. Empirical evaluations on benchmark datasets validate that DCRL outperforms existing methods, achieving superior safety and efficacy in constrained offline tasks.

Submission Number: 29

Loading