Abstract: Scientific discovery catalyzes human intellectual advances, driven by the cycle of hypothesis generation, experimental design, evaluation, and assumption refinement. Central to this process is causal inference, uncovering the mechanisms behind observed phenomena. While randomized experiments provide strong inferences, they are often infeasible due to ethical or practical constraints. However, observational studies are prone to confounding or mediating biases. While crucial, identifying such backdoor paths is expensive and heavily depends on scientists' domain knowledge to generate hypotheses. We introduce a novel benchmark where the objective is to complete a partial causal graph. We design a benchmark with varying difficulty levels with over 4000 queries. We show the strong ability of LLMs to hypothesize the backdoor variables between a cause and its effect.
Unlike simple knowledge memorization of fixed associations, our task requires the LLM to reason according to the context of the entire graph
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Causality, Scientific Discovery, LLMs, Reasoning
Contribution Types: Model analysis & interpretability
Languages Studied: English
Keywords: Causality, Scientific Discovery, LLMs, Reasoning
Submission Number: 5314
Loading