The Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models

ICLR 2026 Conference Submission24832 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Forecasting, Large Model, Time Series Foundation Models
Abstract: Time series (TS) forecasting plays a vital role in practice, but remains a highly challenging task. The outstanding performance of large-scale models across multiple domains has driven the advancement of large-scale TS models, providing an effective pathway for forecasting task. Performance degradation has been observed in large-scale TS models, demonstrating that bigger is not always better which is a puzzling phenomenon. We trained two categories of large-scale TS models, LLM4TS and TSFMs, across four scales, examining how architecture, model size, data volume and distribution, and training strategies influence model performance. Due to the lack of in-depth studies on representations in large-scale TS models, we examined the evolution of representations from both inter-layer and intra-layer perspectives. Our analysis reveals that only a small subset of layers play a critical role in learning, while the majority contribute minimally—a phenomenon we term few-layer dominance. Building on the insight, we propose a method to identify critical layers, allowing models to achieve performance on par while improving inference efficiency. Validation on existing large-scale TS models confirms the universality of few-layer dominance and the reliability of critical layers identification method.
Primary Area: learning on time series and dynamical systems
Submission Number: 24832
Loading