TL;DR: We introduce local merging, a domain-specific token merging algorithm, to boost the efficiency of time series foundation models by up to 5400% while maintaining prediction quality.
Abstract: Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.
Lay Summary: Transformers are good at dealing with time-based sequences. They can be slow and need a lot of computing power when working with really long sequences. In computer vision (image processing), a technique called token merging has helped to speed up transformers. Token merging combines several similar data chunks (tokens) into one. We extend this idea for the first time to time-based sequences. We also invent a new method called local merging. It only merges tokens that are locally close together, not any tokens in a sequence. This makes the method more efficient for long sequences. It can also be applied in decoder models. We test our method thoroughly and find that it makes models up to 5400% faster without affecting accuracy.
Primary Area: Deep Learning->Sequential Models, Time series
Keywords: Time Series, Token Merging, Transformer, State-Space Model
Submission Number: 4725
Loading