Are Large Language Models Really Reliable Zero-shot Time Series Forecaster? Failure Analysis via the Lens of Stationarity

Yueyue Sun; Chi Chiu SO; Su TAN; Siu Pang Yung; Jun-Min Wang

Are Large Language Models Really Reliable Zero-shot Time Series Forecaster? Failure Analysis via the Lens of Stationarity

Yueyue Sun, Chi Chiu SO, Su TAN, Siu Pang Yung, Jun-Min Wang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Time Series, Zero-shot, Stationarity

Abstract: Large Language Models (LLMs) have been widely adapted as zero-shot time series forecasters recently. However, the reliability of such approach is still under debate. To address this, we propose a novel, simple and rigorous evaluation methodology on zero-shot LLM forecasters via the lens of stationarity. An unbiased and robust zero-shot forecaster should preserve stationarity, i.e. given an input series with distinct time-independent mean and variance, a capable forecaster should maintain these same data characteristics in the output. Our comprehensive experiments discover that LLMs consistently fail on preserving stationarity, with forecasts contaminated by profound hidden biases and trends persistently visible even after averaging over hundreds of iterations. Furthermore, the reasoning content of LLMs reveals that LLMs incline to blind guessing simplistic numeric patterns through the last few time steps of the input series, without genuine understanding of the full time series. Our findings underscore the crucial need for cautious application of LLMs in zero-shot time series forecasting. Our code repository will be publicly available upon publication of the paper.

Primary Area: learning on time series and dynamical systems

Submission Number: 12330

Loading