G-GLformer: Transformer with GRU Embedding and Global-Local Attention for Multivariate Time Series Forecasting

Wenjun Yu, Jiyanglin Li, Wentao Gao, Niangxi Zhuang, Wen Li, Shouguo Du

Published: 2025, Last Modified: 25 Jan 2026ECML/PKDD (8) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Time series forecasting plays a vital role in various fields. Due to the special ability of its self-attention mechanism in capturing long-term dependencies, Transformer has been widely used in time series modeling. However, the majority of contemporary Transformer-based models adopt variate tokenization, where the self-attention mechanism is used to extract variable correlations, which weakens the extraction of temporal correlations. Furthermore, the self-attention mechanism extracts correlations within the look-back window. Owing to the absence of a global perspective, the correlations it captures may be influenced by local noise. To tackle these issues, we propose an advanced Transformer architecture entitled G-GLformer, which designs two novel modules, Bidirectional-Patch-GRU-Embedding (BPGE) and Global-Local-Attention (GLA), and integrates them into the Transformer to achieve more accurate forecast. Specifically, the BPGE module is mainly used to model temporal relationships and enhance local semantics. The GLA module integrates the correlation coefficients of the training set data with the data from the local look-back window. This endows the data in the look-back window with a global perspective, making it less susceptible to the influence of noise. Moreover, they can also be used as plug-ins in other models. Extensive experiments on public datasets demonstrate its superior performance over other state-of-the-art models.

External IDs:dblp:conf/pkdd/YuLGZLD25