Xihe: Scalable Zero-shot Time Series Learner via Hierarchical Interleaved Block Attention

Xihe: Scalable Zero-shot Time Series Learner via Hierarchical Interleaved Block Attention

ICLR 2026 Conference Submission16977 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: time series, time series foundation model

TL;DR: We propose a zero-shot Time Series Learner via Hierarchical Interleaved Block Attention

Abstract: The rapid advancement of time series foundation models (TSFMs) has been propelled by migrating architectures from language. While existing TSFMs demonstrate impressive performance, their direct adoption of cross-domain architectures constrains effective capture multiscale temporal dependencies inherent to time series data. This limitation becomes particularly pronounced during zero-shot transfer across datasets with divergent underlying patterns and sampling strategies.To address these challenges, we propose Hierarchical Interleaved Block Attention (HIBA) which employs hierarchical inter- and intra-block sparse attention to effectively capture multi-scale dependencies. Intra-block attention facilitates local information exchange, and inter-block attention operates across blocks to capture global temporal pattern interaction and dynamic evolution. Leveraging the HIBA architecture, we introduce Xihe, a scalable TSFM family spanning from an ultra-efficient 9.5M parameter configuration to high-capacity 1.5B variant. Evaluated on the comprehensive GIFT-Eval benchmark, our most compact Xihe-tiny model (9.5M) surpasses the majority of contemporary TSFMs, demonstrating remarkable paramater efficiency. More impressively, Xihe-max (1.5B) establishes new sate-of-the-art zero-shot performance, surpassing previous best results by a substantial margin. This consistent performance excellence across the entire parameter spectrum provides compelling evidence for the exceptional generalization capabilities and architectural superiority of HIBA.

Primary Area: learning on time series and dynamical systems

Submission Number: 16977

Loading