MacroBench: Measuring Frontier LLM Macroeconomic Forecasting Ability

Arjun Neervannan; Sujai Hiremath; Sumiran Singh Thakur; Guanghan Ning; Deniz Zorlu

MacroBench: Measuring Frontier LLM Macroeconomic Forecasting Ability

Arjun Neervannan, Sujai Hiremath, Sumiran Singh Thakur, Guanghan Ning, Deniz Zorlu

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Forecast@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmark, forecasting, macroeconomic

Abstract: Macroeconomic forecasting is an important task, yet no existing benchmark both anonymizes the underlying data and validates that evaluations are uncontaminated, a critical gap given that historical price series are heavily represented in pretraining corpora. We introduce MacroBench, a benchmark that measures macroeconomic forecasting in LLMs via US 10-year Treasury yield forecasts across 35 years of macro regimes, with anonymized historical and temporal context and a contamination adjustment mechanism. Each task presents a 36-month window of twelve z-scored macro indicators, gives the model a fixed statistical toolset in Python, and elicits a distributional 10Y yield forecast at 1-, 3-, and 6-month horizons. After contamination adjustment, no frontier LLM significantly outperforms walk-forward AR(1) on anonymized Treasury yield forecasting except GPT-5.5, which narrowly beats it. More broadly, we establish a recipe for contamination-free benchmarks on any historically rich macro series, enabling rigorous evaluation in central-bank scenario analysis, monetary policy research, and debt markets.

Submission Number: 103

Loading