Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism

Published: 01 Jan 2021, Last Modified: 05 Jun 2025CIKM 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent trends of incorporating LSTM network with different attention mechanisms in time series forecasting have led researchers to consider the attention module as an essential component. While existing studies revealed the effectiveness of attention mechanism with some visualization experiments, the underlying rationale behind their outstanding performance on learning long-term dependencies remains hitherto obscure. In this paper, we aim to elaborate on this fundamental question by conducting a thorough investigation of the memory property for LSTM network with attention mechanism. We present a theoretical analysis of LSTM integrated with attention mechanism, and demonstrate that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score. In particular, our theory shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard LSTM. Experimental results on four real-world time series datasets demonstrate the superiority of the attention mechanism for maintaining long-term memory when compared to the state-of-the-art methods, and further corroborate our theoretical analysis.
Loading