Keywords: Time series forecasting, Transformer models, Attention mechanisms
Abstract: Transformer-based models have demonstrated superior performance in multivariate time series forecasting, leveraging attention mechanisms. However, existing transformer-based methods tend to overlook the high similarity among adjacent time steps and the correlation between multivariate time series. This often results in block-wise attention patterns that hinder efficient global information capture, thereby limiting the model’s representation capacity and degrading prediction accuracy. In this work, we mathematically characterize and theoretically validate this limitation, showing how it undermines the stability of learned representations and restricts effective feature extraction. To alleviate this issue, we propose a lightweight and model-agnostic framework named Sparsity Enhanced Attention in the Frequency Domain (SEAT). By projecting time series data into the frequency domain, SEAT reconstructs the attention matrix to mitigate block-wise patterns, thereby enhancing the model’s ability to capture global temporal dependencies. As a plug-and-play module, SEAT can be integrated into existing transformer-based architectures without altering their core structure. Extensive experiments on standard benchmarks demonstrate that SEAT consistently improves predictive performance while preserving computational efficiency.
Primary Area: learning on time series and dynamical systems
Submission Number: 15439
Loading