Improving constraint-based discovery with robust propagation and reliable LLM priors

ICLR 2026 Conference Submission19890 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: causal discovery, PC, domain knowledge, LLM, hallucination
Abstract: Learning causal structure from observational data is central to scientific model- ing and decision-making. Constraint-based methods aim to recover conditional independence (CI) relations in a causal directed acyclic graph (DAG). Classical approaches such as PC and subsequent methods orient v-structures first and then propagate edge directions from these seeds, assuming perfect CI tests and exhaus- tive search of separating subsets—assumptions often violated in practice, leading to cascading errors in the final graph. Recent work has explored using large lan- guage models (LLMs) as experts, prompting sets of nodes for edge directions, and could augment edge orientation when assumptions are not met. However, such methods implicitly assume perfect experts, which is unrealistic for hallucination- prone LLMs. We propose MosaCD, a causal discovery method that propagates edges from a high-confidence set of seeds derived from both CI tests and LLM annotations. To filter hallucinations, we introduce shuffled queries that exploit LLMs’ positional bias, retaining only high-confidence seeds. We then apply a novel confidence-down propagation strategy that orients the most reliable edges first, and can be integrated with any skeleton-based discovery method. Across multiple real-world graphs, MosaCD achieves higher accuracy in final graph con- struction than existing constraint-based methods, largely due to the improved re- liability of initial seeds and robust propagation strategies.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 19890
Loading