Causal Discovery over High-Dimensional Structured Hypothesis Spaces with  Causal Graph Partitioning

Ashka Shah; Adela Frances DePavia; Nathaniel C Hudson; Ian Foster; Rick Stevens

Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

Ashka Shah, Adela Frances DePavia, Nathaniel C Hudson, Ian Foster, Rick Stevens

Published: 12 Mar 2025, Last Modified: 12 Mar 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way---without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of Directed Acyclic Graphs (DAG), to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap. In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees under the Maximal Ancestral Graph (MAG) class. We leverage the idea of a superstructure---a set of learned or existing candidate hypotheses---to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces.

Submission Type: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We made the following formatting fixes and definition clarifications brought up by reviewers: - Moved the definition table from the appendix to the main text (Table 1, page 3), including a definition of adjacency in a graph ~ - Fixed footnote formatting so that they go after punctuation. - Clarification on the definitions of $m_1$ and $m_2$ on page 7 - Increased size of Figures 3-5 - Fixed phrasing of the sentences mentioned by a reviewer under Figure 3, in Figure 6 caption, and in Appendix G We included the following text to acknowledge the reviewers concerns at a high level: - Included a short paragraph at the end of Section 4 on page 6 clarifying guarantees for learning the MEC rather than the DAG, alluding to future extensions including interventional datasets. - Included a short paragraph on the limitations of the current method (perfect superstructure, sample efficiency) in the conclusion/future works on page 13/14.

Video: https://drive.google.com/file/d/1msiwnw0c75HButj6m66lNonSC9XxKd8O/view?usp=drive_link

Code: https://github.com/shahashka/causal_discovery_via_partitioning

Assigned Action Editor: ~Emmanuel_Bengio1

Submission Number: 3560

Loading