Keywords: Token Merging, Time Series Foundation Models, State-Space Models, Transformers
TL;DR: We introduce local merging, a domain-specific token merging algorithm, to boost the efficiency of time series foundation models by up to 54 times while maintaining prediction quality.
Abstract: Transformer architectures and state-space models have shown promising results in time series analysis. However, processing very long sequences imposes significant computational requirements. Token merging, which involves replacing multiple tokens with a single one calculated as their linear combination, has shown to considerably improve the throughput of vision transformer architectures while maintaining accuracy. In this work, we perform the first investigations of token merging in time series analysis. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its the computational complexity from quadratic to linear based on the neighborhood size to effectively scale token merging to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Our comprehensive empirical evaluation demonstrates that token merging offers substantial computational benefits with minimal impact on accuracy across various models and datasets. On the recently proposed Chronos foundation model, we achieve accelerations up to 5400% with only minor accuracy degradations.
Submission Number: 7
Loading