Learning Generalizable Environment Models via Discovering Superposed Causal Relationships

21 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Dynamics Model Learning
Abstract: In reinforcement learning, a generalizable world model to mimic the environment is crucial for the assessment of various policy values in downstream tasks such as offline policy optimization and off-policy evaluation. Recently, studies have shown that learning a world model with sparse connections identified by causal discovery techniques can improve generalizability. So far, these studies focus on discovering a single and global causal structure. In this paper, we discuss a more practical setting in which the agent is deployed in an environment mixed with different causal mechanisms, called superposed causal relationships in this article. In this case, global causal discovery techniques will derive a degraded dense causal relationship, which will fail to improve the generalizability of the learned model. To solve the problem, we propose \textbf{S}uperposed c\textbf{A}usal \textbf{M}odel (SAC) learning. SAM learning is an end-to-end framework that learns a transformer-based model which can recognize the causal relationships that the agent is encountering on the fly and then adapts its predictions. The experiments are conducted in two simulated environments, where SAM shows powerful identify abilities in environments with superposed causal relationships. Both the dynamics model and the policies learned by the SAM~generalize well to unseen states.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2354
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview