Scaling Open-Ended Reasoning to Predict the Future

Published: 28 Sept 2025, Last Modified: 09 Oct 2025SEA @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: forecasting, rl, dataset
TL;DR: We create a dataset of open-ended forecasting questions using news articles and show its benefits at future prediction when training language models using RL.
Abstract: While language models now show remarkable capabilities on fully specified exam-style problems, most real-world decisions involve reasoning under uncertainty. In this work, we train language models to make predictions on open-ended questions about the future. To scale up training data, we use daily news to synthetically generate forecasting questions about global events that have already occurred. Using Reinforcement Learning (RL), we train an 8B parameter model on our dataset, NewsCast, achieving predictions at par with OpenAI's much larger GPT-OSS 120B. RL training on NewsCast, not only improves both accuracy and calibration for forecasting, it also reduces hallucinations, as measured using SimpleQA. Our findings demonstrate the usefulness of goal-oriented synthetic data generation pipelines for training language models to predict the future. We will open-source our models, code, and data to make LLM forecasting research broadly accessible.
Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.
Submission Number: 164
Loading