Outcome-based Reinforcement Learning to Predict the Future

Outcome-based Reinforcement Learning to Predict the Future

TMLR Paper5575 Authors

08 Aug 2025 (modified: 15 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events – a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10\% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=Wxn5NsX2Ww

Changes Since Last Submission: - Uploaded test dataset of questions, prompts, and LLM responses to aid reproducibility - Added methodological clarifications and fixed typographical errors - Broader impact statement

Assigned Action Editor: ~Jacek_Cyranka1

Submission Number: 5575

Loading