WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu; Xu Han; Yan Sun; Shihao Yang

WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

Published: 01 May 2025, Last Modified: 29 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose a Weighted Autoregressive Varying gatE (WAVE) attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components. It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that WAVE attention that incorporates the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.

Lay Summary: Predicting future events from past data is important in fields like finance and weather forecasting. We developed WAVE attention, a new deep learning method inspired by statistical ARMA models. WAVE effectively separates short-term (recent events) and long-term (historical trends) influences within time series data. By combining statistical insights with advanced Transformer models, WAVE significantly improves forecasting accuracy without extra complexity. Our approach provides clearer predictions by explicitly distinguishing these different temporal impacts, achieving better performance in accurate and interpretable time series forecasting.

Link To Code: https://github.com/LJC-FVNR/ARMA-Attention

Primary Area: Deep Learning->Sequential Models, Time series

Keywords: Attention, Transformer, Autoregressive Moving-average, Time Series Forecasting

Submission Number: 15065

Loading