MacroBench: Measuring Frontier LLM Macroeconomic Forecasting Ability

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Forecast@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark, forecasting, macroeconomic
Abstract: Macroeconomic forecasting is an important task, yet no existing benchmark both anonymizes the underlying data and validates that evaluations are uncontaminated, a critical gap given that historical price series are heavily represented in pretraining corpora. We introduce MacroBench, a benchmark that measures macroeconomic forecasting in LLMs via US 10-year Treasury yield forecasts across 35 years of macro regimes, with anonymized historical and temporal context and a contamination adjustment mechanism. Each task presents a 36-month window of twelve z-scored macro indicators, gives the model a fixed statistical toolset in Python, and elicits a distributional 10Y yield forecast at 1-, 3-, and 6-month horizons. After contamination adjustment, no frontier LLM significantly outperforms walk-forward AR(1) on anonymized Treasury yield forecasting except GPT-5.5, which narrowly beats it. More broadly, we establish a recipe for contamination-free benchmarks on any historically rich macro series, enabling rigorous evaluation in central-bank scenario analysis, monetary policy research, and debt markets.
Submission Number: 103
Loading