Beyond Statistical Patterns: Integrating Textual Domain Knowledge with Causal Discovery for Calibrated Uncertainty Estimation
Keywords: Causal Discovery, Uncertainty Quantification, Calibration, Large Language Models, Reliability-Weighted Ensemble, Textual Domain Knowledge, Statistical Methods, Tübingen Benchmark, Evidence Integration, Temperature Scaling
TL;DR: We integrate textual domain knowledge from LLMs with statistical causal discovery methods to produce well-calibrated causal predictions, achieving higher accuracy and 59% reduction in calibration error on Tübingen benchmark pairs.
Abstract: Abstract:
Causal discovery from observational data often prioritizes prediction accuracy while neglecting reliable uncertainty estimates, limiting practical decision-making. Large Language Models (LLMs) show strong causal reasoning capabilities from textual descriptions but rely primarily on pattern recognition rather than principled inference. We propose a reliability-weighted ensemble framework that systematically integrates textual domain knowledge with multiple statistical causal discovery methods to provide well-calibrated confidence estimates for causal relationships. Our method combines LLM-derived evidence with six statistical approaches through reliability weighting, log-odds aggregation, and temperature-scaled calibration. Experiments on 72 Tübingen benchmark pairs demonstrate substantial improvements: accuracy increases from 93.1% to 94.4%, calibration error (DECE) reduces by 59% (0.100→0.041), and high-confidence prediction coverage expands to 66% of pairs. This framework enables principled, uncertainty-aware causal inference, supporting reliable decision-making in scientific and high-stakes applications.
Supplementary Material: pdf
Submission Number: 164
Loading