Efficient Compression of Time-Series Foundation Models via Consensus Subspace Distillation

Zexing Zhang; Qingxin Zhao; Huimin Lu

Efficient Compression of Time-Series Foundation Models via Consensus Subspace Distillation

Zexing Zhang, Qingxin Zhao, Huimin Lu

16 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time-Series Foundation Models, Time-Series Forecasting, Model Compression, Knowledge Distillation, Consensus Subspace Optimization, Uncertainty Injection

Abstract: Compressing universal time-series foundation models (TSFMs) significantly reduces computational and storage overhead, thereby facilitating their widespread adoption. In TSFM compression techniques, knowledge distillation stands out by transferring knowledge from teacher models to student models. However, existing distillation methods often overlook the inherent consensus representation spaces in TSFMs and the imbalance in hierarchical contributions, leading to inefficient knowledge transfer. To address this, we propose a novel approach that reformulates distillation as a consensus subspace optimization task, leveraging the observation that high-level embeddings autonomously converge across different model scales, along with the long-tail distribution of hierarchical contributions. We tackle the consensus subspace problem by identifying and extracting scale-invariant low-rank subspaces: on local data subsets, we perform singular value decomposition on embeddings from offline-selected consensus layers to derive consensus projection matrices, which are then used to fine-tune the student model, ensuring representation alignment and accelerated convergence. Additionally, we introduce a scalable uncertainty injection mechanism to enhance generalization to unseen data, modeling subset biases as frequency-domain gaps to inflate covariances. Extensive experiments demonstrate that our framework excels on multiple standard time-series datasets, with student models even surpassing teacher performance in time-series forecasting tasks. Compared to state-of-the-art methods, our approach achieves over 90% parameter reduction and 100x distillation speedup while retaining comparable performance across various time-series tasks. Code and compressed model weights are available via an anonymous link: anonymous.4open.science/r/CSD-13C3.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 6527

Loading