Boundary-Aware Tokenization for Event-Driven Time-Series Forecasting

Boundary-Aware Tokenization for Event-Driven Time-Series Forecasting

ICLR 2026 Conference Submission17237 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time Series, Dynamic Chunking, LLM

TL;DR: Dynamic Chunking Network for Time Series Forecasting

Abstract: Transformer-based large sequence models have recently been extended from language to time-series to capture long-range dependencies and heterogeneous dynamics. However, unlike language, time-series lack a natural dictionary for principled tokenization: existing large sequence models often resort to fixed-length tokens or patches for computational efficiency. This design can obscure regime changes, expend attention on low-information tokens, and restrict the effective context length. We address this limitation with Boundary-aware tokenization, which initiates new tokens only at predicted regime changes in the time-series, analogous to how spaces delimit words in language. At its core, the model integrates an unsupervised boundary detector to form variable-length chunks, an intra-chunk fusion module to derive chunk-level token embeddings, and a smoothing module to stabilize training, before passing the resulting tokens to Transformer-based modules. We further add a gating refinement that fuses fixed- and variable-length representations before the forecasting decoder, enabling adaptive selection during pre-training based on data patterns. This design directly addresses event-driven regime changes, while remaining robust in stationary regimes. Across diverse benchmarks, our method reduces forecasting error by 10.5\% on average, with learned chunks aligned with true regime boundaries. We also show that the model adaptively reverts to fixed-length tokenization in stationary time-series.

Primary Area: learning on time series and dynamical systems

Submission Number: 17237

Loading