Scale-Aware Pretraining of Time Series Foundation Models via Multi-Patch Token Alignment and Hybrid Masking

16 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time series foundation model, Pre-training, Time series forecasting
Abstract: Pretraining time series foundation models across diverse datasets necessitates effective handling of varying sampling frequencies. A prevalent approach assigns dataset-specific patch sizes based on sampling rates and employs separate MLPs for token projection, which leads to fragmented representations across scales and hinders alignment and transferability. In contrast, some studies enforce a fixed patch size across datasets to ensure consistency, yet this uniformity neglects inherent temporal variations and often causes information loss. To address these challenges, we propose a scale-aware token alignment mechanism that treats the patch size used during input segmentation as an explicit notion of scale. By incorporating contrastive learning across scales, our approach aligns the representation spaces induced by different MLPs while preserving their distinct modeling capacities. On top of this aligned representation, we introduce a hybrid masking strategy that enables multi-scale temporal understanding at the token level. By combining random and contiguous masking, the model learns to recover both fine-grained patterns and long-range temporal structures during pretraining. Experiments on benchmark datasets show that our approach consistently improves forecasting performance, highlighting the benefits of scale-aware token alignment and multi-scale understanding in time series model pretraining.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 6605
Loading