Keywords: Time series forecasting, Transformer, Deep learning
TL;DR: We propose DeformableTST, a Transformer-based model less reliant on patching, to broaden the applicability of Transformer-based models in time series forecasting tasks and achieves SOTA performance in a wider range of time series forecasting tasks.
Abstract: With the proposal of patching technique in time series forecasting, Transformerbased models have achieved compelling performance and gained great interest from
the time series community. But at the same time, we observe a new problem that
the recent Transformer-based models are overly reliant on patching to achieve ideal
performance, which limits their applicability to some forecasting tasks unsuitable
for patching. In this paper, we intent to handle this emerging issue. Through diving
into the relationship between patching and full attention (the core mechanism
in Transformer-based models), we further find out the reason behind this issue
is that full attention relies overly on the guidance of patching to focus on the
important time points and learn non-trivial temporal representation. Based on this
finding, we propose DeformableTST as an effective solution to this emerging
issue. Specifically, we propose deformable attention, a sparse attention mechanism
that can better focus on the important time points by itself, to get rid of the need of
patching. And we also adopt a hierarchical structure to alleviate the efficiency issue
caused by the removal of patching. Experimentally, our DeformableTST achieves
the consistent state-of-the-art performance in a broader range of time series tasks,
especially achieving promising performance in forecasting tasks unsuitable for
patching, therefore successfully reducing the reliance on patching and broadening
the applicability of Transformer-based models. Code is available at this repository:
https://github.com/luodhhh/DeformableTST.
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 5480
Loading