TiViT: Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

Published: 25 May 2026, Last Modified: 29 May 2026FMSD @ ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: time series classification, foundation models, vision transformers
Abstract: Adapting vision models for time series analysis is compelling, yet all existing approaches are falling short of dedicated time series foundation models (TSFMs) in classification. We propose **Ti**me **Vi**sion **T**ransformer (**TiViT**), the first framework to successfully unlock the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets for time series classification. By using hidden representations of OpenCLIP models, TiViT achieves state-of-the-art performance on time series classification benchmarks without finetuning. We analyze the representations of TiViT and find that intermediate ViT layers with high intrinsic dimension are the most effective. We further assess the alignment between TiViT and TSFM representation spaces, revealing strong complementarity and additional gains through feature concatenation. Finally, we unfreeze the ViT backbone for continual pretraining on synthetic time series.
Submission Number: 178
Loading