From Two to One: Harmonizing Attention and Feature Debiasing for Multivariate Time Series Forecasting

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Forecasting, Frequency Debiasing
TL;DR: We propose FADformer, a frequency-aware debiasing framework, which harmonizes the low- and high-frequency components of attention and feature maps to capture fine-grained patterns for accurate forecasting.
Abstract: Multivariate time series forecasting (MTSF) models based on Transformers have shown remarkable success in various applications, such as energy management, weather forecasting, and traffic monitoring. However, due to the complex and intertwined correlations among variates, Transformer-based methods often fail to precisely model the interactions among series, leading to limited performance improvement. In this paper, we rigorously investigate and establish the phenomenon of feature oversmoothing in Transformer-based forecasters through a theoretical analysis. To this end, we then propose \textbf{FADformer}, a frequency-aware debiasing framework, which harmonizes the low- and high-frequency components of attention and feature maps to capture fine-grained patterns for accurate forecasting. Specifically, we design two plug-and-play modules using the Fourier transformation, where i) AttnDeb rescales high-frequency weights within attention modules to mitigate the low-pass limitation and ii) FeatDeb injects inductive feature bias into residual connections to amplify the important high-frequency signals. Extensive experiments on challenging real-world datasets show the superiority of our FADformer over existing state-of-the-art methods, in terms of both forecasting performance and generalization ability.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 11550
Loading