Keywords: causal discovery, PC, domain knowledge, LLM, hallucination
Abstract: Learning causal structure from observational data is central to scientific model-
ing and decision-making. Constraint-based methods aim to recover conditional
independence (CI) relations in a causal directed acyclic graph (DAG). Classical
approaches such as PC and subsequent methods orient v-structures first and then
propagate edge directions from these seeds, assuming perfect CI tests and exhaustive search of separating subsets—assumptions often violated in practice, leading
to cascading errors in the final graph. Recent work has explored using large lan-
guage models (LLMs) as experts, prompting sets of nodes for edge directions,
and could augment edge orientation when assumptions are not met. However,
such methods implicitly assume perfect experts or predictable error rates, which
is unrealistic for hallucination-prone and unstable LLMs. We propose MosaCD,
a causal discovery method that propagates edges from a high-confidence set of
seeds derived from both CI tests and LLM annotations. To filter hallucinations, we
introduce shuffled queries that exploit LLMs’ positional bias, retaining only high-
confidence seeds. We then apply a novel confidence-down propagation strategy
that orients the most reliable edges first, and can be integrated with any skeleton-
based discovery method. Across multiple real-world graphs, MosaCD achieves
higher accuracy in final graph construction than existing constraint-based methods, largely due to the improved reliability of initial seeds and robust propagation
strategies.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 19890
Loading