STARformer: STructural Attention tRansformer for Long-term Time Series Forecasting

Sheo yon Jhin; Seojin Kim; Noseong Park

STARformer: STructural Attention tRansformer for Long-term Time Series Forecasting

Sheo yon Jhin, Seojin Kim, Noseong Park

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Long-term time series forecasting, time-series transformer

TL;DR: Structural attention transformer

Abstract: In recent years, Transformers have been gaining attention in the fields of Natural Language Processing, Computer Vision and time-series. Despite the lack of a mechanism to exploit the characteristics of time series data, it has demonstrated its potential in a variety of applications. These capability gaps, including lack of decomposability and interpretability, often make them suboptimal in long-term forecasting efforts. To address these issues, this paper introduces STructural Attention tRansformer, called STARformer, an innovative transformer architecture optimized for time series forecasting. In this work, we improve the transformer by replacing self-attention. Many recent studies show performance improvements by replacing self-attention with traditional time series decomposition algorithms or Fourier transform algorithms. This paper follows recent research trends. This architecture obtains structural attention from a single-layer model and amplifies efficiency and accuracy by replacing the self-attention of existing transformers. To obtain structural attention, i) decompose the complex time series into simple trends or seasonality using traditional time series decomposition methods, and ii) have a single-linear layer model to predict the future of simple time series (e.g., trends or seasonality). iii) Structural attention is extracted through a pre-trained single linear layer model. STARformer, which replaced the existing transformer's self-attention with a structural attention block, outperformed the existing baseline by non-trivial margin in experiments using 9 real data sets and 12 baselines.

Supplementary Material: pdf

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2672

Loading