Scaling Open-Ended Reasoning to Predict the Future

Scaling Open-Ended Reasoning to Predict the Future

ICLR 2026 Conference Submission19734 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: forecasting, rl, dataset

TL;DR: We create a dataset of open-ended forecasting questions using news articles and show its benefits at future prediction when training language models using RL.

Abstract: While language models now show remarkable capabilities on fully specified exam-style problems, most real-world decisions involve reasoning under uncertainty. In this work, we train language models to make predictions on open-ended questions about the future. To scale up training data, we continually synthesise novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, a supervised finetuning phase, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster-8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We will open-source our models, code, and data to make LLM based forecasting research broadly accessible.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 19734

Loading