Keywords: Time series forecasting, Foundation models, Zero shot learning, Chronos, TimesFM, TimeGPT, MOMENT, GPT-OSS, Whisper, ViT.
TL;DR: TSFMs outperform cross modal FMs in zero shot time series forecasting, but GPT-OSS shows surprising competitiveness.
Abstract: Foundation models (FMs) have achieved major advances in language, vision, and speech. In parallel, time series foundation models (TSFMs) have been developed to address forecasting tasks. A key question is whether TSFMs truly generalize to unseen time series data, and whether they perform better than general purpose FMs from other domains in a zero shot setting. We compare four TSFMs such as Chronos, TimesFM, TimeGPT, and MOMENTs with cross domain FMs for text (GPT), audio (Whisper), and vision (ViT). For a systematic comparison, we use simple task-agnostic adapters to convert sequences into forecasts, without fine tuning or changing the backbone models. All models are evaluated on nine diverse datasets that were unseen during training. Our results show that TSFMs perform best on most datasets, highlighting the benefit of temporal pretraining and time-aware design. Overall, the strong zero shot performance of TSFMs suggests that they may represent a breakthrough comparable to BERT for time series forecasting. At the same time, large text based models such as GPT remain surprisingly competitive, in some cases even surpassing TSFMs, highlighting the ability of general purpose models to capture temporal patterns despite not being trained for this task. GitHub repository: https://github.com/anonymous4865/tsfms.
Submission Number: 24
Loading