Keywords: reinforcement learning, decision transformer, causality
Abstract: Decision Transformer (DT) plays a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, such ideal data is often lacking, with the underrepresentation of optimal behaviours posing a significant challenge. This limitation highlights the difficulty of relying on offline datasets for training, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT’s ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in out-of-distribution scenarios. Extensive experiments across continuous and discrete action spaces, including environments with limited data, demonstrate that CRDT consistently outperforms conventional DT approaches. Additionally, reasoning counterfactually allows the DT agent to obtain stitching ability, allowing it to combine suboptimal trajectories. These results highlight the potential of counterfactual reasoning to enhance RL agents' performance and generalization capabilities.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6788
Loading