Keywords: causal discovery, PC, domain knowledge, LLM, hallucination
Abstract: Learning causal structure from observational data is central to scientific model-
ing and decision-making. Constraint-based methods aim to recover conditional
independence (CI) relations in a causal directed acyclic graph (DAG). Classical
approaches such as PC and subsequent methods orient v-structures first and then
propagate edge directions from these seeds, assuming perfect CI tests and exhaus-
tive search of separating subsets—assumptions often violated in practice, leading
to cascading errors in the final graph. Recent work has explored using large lan-
guage models (LLMs) as experts, prompting sets of nodes for edge directions, and
could augment edge orientation when assumptions are not met. However, such
methods implicitly assume perfect experts, which is unrealistic for hallucination-
prone LLMs. We propose MosaCD, a causal discovery method that propagates
edges from a high-confidence set of seeds derived from both CI tests and LLM
annotations. To filter hallucinations, we introduce shuffled queries that exploit
LLMs’ positional bias, retaining only high-confidence seeds. We then apply a
novel confidence-down propagation strategy that orients the most reliable edges
first, and can be integrated with any skeleton-based discovery method. Across
multiple real-world graphs, MosaCD achieves higher accuracy in final graph con-
struction than existing constraint-based methods, largely due to the improved re-
liability of initial seeds and robust propagation strategies.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 19890
Loading