Forecasting Emerges from Auto-Regressive Pretraining: Latent Predictive Structure in Language Models

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Forecast@ICML26 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: transfer learning, multi-modal, time series
TL;DR: LLMs appear to learn forecasting during next-token pretraining: even frozen models can generate usable time-series forecasts, and finetuning mainly adapts existing temporal structure instead of creating it.
Abstract: Predicting how a sequence will continue is a basic problem for intelligent systems. We show that large language models contain usable forecasting structure before any explicit time-series supervision. A single linear readout from frozen Qwen3-0.6B hidden states maps ordinary text sequences to numerical trajectories that resemble real time series, and those trajectories can be used for straightforward forecasts. The distribution over output tokens also gives coherent, non-crossing probabilistic forecasts in a single forward pass. After time-series specialization, pretrained models show aligned gradients and improve immediately, whereas randomly initialized models spend early training in a destructive-interference regime. These findings suggest that auto-regressive pretraining already shapes representations around temporal continuation; and finetuning adapts that structure to numerical forecasting rather than creating it from scratch.
Submission Number: 118
Loading