Can LLM Event Prediction Be Reliable? Closing Gaps in Causal Quantification and Probabilistic Consistency

Can LLM Event Prediction Be Reliable? Closing Gaps in Causal Quantification and Probabilistic Consistency

ICLR 2026 Conference Submission22653 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Event Prediction, Large Language Models (LLMs), Causal Quantification, Probabilistic Consistency, Bayesian Inference

TL;DR: This paper fixes causal quantification and probabilistic consistency flaws in LLM-based event prediction via LLM-PPL cognitive division of labor, boosting accuracy, uncertainty quantification, and interpretability.

Abstract: Event prediction is one of the core challenges in artificial intelligence. Current Large Language Model (LLM) prediction methods face two key issues: 1. Lack of causal quantification: while LLMs can identify key event factors and their causal relationships from text, they struggle to quantify factor states, weights, and interactions, limiting predictions to qualitative judgments; 2. Probabilistic consistency failure: LLM-generated "probabilities" are results of language pattern matching rather than statistical reasoning, often violating probability axioms, being sensitive to input, and lacking mathematical reliability. To address these bottlenecks, we propose the Probabilistic-Aware Causal Reasoning Engine (PACRE), leveraging "cognitive division of labor": LLMs extract causal knowledge from text and build structured representations, while probabilistic programming languages (PPLs) conduct rigorous Bayesian inference. PACRE uses hierarchical Bayesian fusion to address observational uncertainty and Bayesian model averaging (BMA) to mitigate LLM spurious causal hallucinations. Experiments on multiple datasets show that PACRE achieves statistically significant improvements over existing LLM-based methods in predictive accuracy, uncertainty quantification, and interpretability. Specifically, its complete posterior distributions and confidence intervals effectively address the unreliability of LLM-generated probabilities, delivering transparent, auditable support for decision-making.

Primary Area: causal reasoning

Submission Number: 22653

Loading