Alive and Predicting: A Live Evaluation of Multi-Step Forecasting Agents

Published: 11 Jun 2026, Last Modified: 25 Jun 2026Forecast@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM agents, forecasting, prediction markets
TL;DR: A live, fully-transparent multi-step LLM forecasting agent that beats zero-shot on Kalshi at every horizon, with complete reasoning traces revealing which pipeline stages and tools actually drive accuracy
Abstract: Large language models are increasingly capable forecasters, yet most of this capability has been measured by retrospective backtest on already-resolved questions, and recent live benchmarks score only the agent's final probability. We present a live, multi-step forecasting agent that operates autonomously on a major prediction market, with every intermediate forecast, retrieved evidence item, and tool invocation recorded and published in real time. Scoring forecasts by their information coefficient against the subsequent market movement, the agent beats a zero-shot baseline at every horizon from one day to one month after the forecast, with the largest margin in the first two weeks before the market converges toward the agent's earlier view. Because every reasoning step is recorded, we can identify which pipeline stages and tools contribute most to forecasting accuracy and surface agent pipeline design lessons.
Submission Number: 41
Loading