Future Is Unevenly Distributed: Forecasting Ability Of LLMs Depends On What We’re Asking

Future Is Unevenly Distributed: Forecasting Ability Of LLMs Depends On What We’re Asking

ACL ARR 2026 January Submission8235 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM forecasting, Probabilistic reasoning, calibration, Prediction markets, Failure Modes, model evaluation

Abstract: Large Language Models (LLMs) demonstrate partial forecasting competence across social, political, and economic events. Yet, their predictive ability varies sharply with domain structure and prompt framing. We investigate how forecasting performance varies with different model families on real-world questions about events that happened beyond the model cutoff date. We analyze how context, question type, and external knowledge affect accuracy and calibration, and how adding factual news context modifies belief formation and failure modes. Our results show that forecasting ability is highly variable as it depends on what, and how, we ask.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation; benchmarking; NLP datasets; evaluation methodologies; evaluation; metrics; reproducibility; automatic evaluation of datasets

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 8235

Loading