Learning Scalable Causal Discovery Policies with Adversarial Reinforcement Learning

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: causal reasoning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement learning, causal discovery, adversarial training
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Learning the structure of causal graphs from observational data is a fundamental but challenging problem. Existing works focus on designing search-based methods for finding optimal causal graphs. However, search-based methods have proven low-efficient since they are naturally limited by the burdensome computation of decision criteria at every step. Consequently, they can hardly scale to larger tasks. This paper proposes a novel framework called AGCORL to learn reusable causal discovery policies, which can zero-shot generalize to related tasks with much larger sizes. Specifically, AGCORL employs an Ordering Learning (OL) agent to directly infer the order of variables taken from the observational data as input. To further improve the generalizability of the OL agent, an ADversarial (AD) agent is employed to actively mine tasks where the OL agent fails to find high-quality solutions. We theoretically prove that the AD agent significantly reduces the number of required tasks to achieve generalizability of the OL agent. Extensive empirical evaluations demonstrate the superiority of our method in both runtime and solution quality over the state-of-the-art baselines.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4466
Loading