Forecasting Emerges from Auto-Regressive Pretraining: Latent Predictive Structure in Language Models
Keywords: transfer learning, multi-modal, time series
TL;DR: LLMs appear to learn forecasting during next-token pretraining: even frozen models can generate usable time-series forecasts, and finetuning mainly adapts existing temporal structure instead of creating it.
Abstract: Predicting how a sequence will continue is a basic problem for intelligent systems. We show that large language models contain usable
forecasting structure before any explicit time-series supervision. A
single linear readout from frozen Qwen3-0.6B hidden states maps ordinary text
sequences to numerical trajectories that resemble real time series, and those
trajectories can be used for straightforward forecasts. The distribution over output tokens also gives coherent, non-crossing probabilistic forecasts in a single forward pass. After time-series
specialization, pretrained models show aligned gradients and improve
immediately, whereas randomly initialized models spend early training in a
destructive-interference regime. These findings suggest that auto-regressive
pretraining already shapes representations around temporal continuation; and
finetuning adapts that structure to numerical forecasting rather than
creating it from scratch.
Submission Number: 118
Loading