CAU: A Causality Attention Unit for Spatial-Temporal Sequence Forecast

Bo Qin, Fanqing Meng, Shijin Yuan, Bin Mu

Published: 2024, Last Modified: 13 Nov 2024IEEE Trans. Multim. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing convolution recurrent neural networks (ConvRNNs)-based memory cells majorly take advantage of gated structures and attention mechanisms to extract discontinuous latent associations for spatial-temporal sequence forecast (STSF) problems, which may lead to serious over-fitting and spurious relationships with correlated noise. It is a consensus that incorporating cause-effect relationships in modeling can alleviate these problems. In this paper, we propose a Causality Attention Unit (CAU) to assist ConvRNNs by complementing the causal inference ability in a plug-and-play way. Specifically, CAU serially consists of the attention module and causality module. The former is constructed by a spatial-channel attention layer, which preliminarily generates the correlated future with the correlations between historical memories and the current state. The latter borrows the idea of transfer entropy ( ${\bm{TE}}$ ) to detect the latent cause-effect relationships and precisely correct the correlated future. A space-time exchange strategy for accelerating the calculation of ${\bm{TE}}$ in CAU is also designed. CAU can be easily combined with the existing ConvRNN cells, and we construct a simple general model to predict long-term spatial-temporal series, which consists of encoder/decoder and stacked CAU paralleled to stacked ConvRNN cells. After determining the optimal model structure, we carry out a series of experiments to evaluate model performance, including comparisons with other advanced models, training loss analysis, and multiple ablation and sensitivity studies. Experimental results show that our proposed model can effectively improve the performances of existing ConvRNNs to the state-of-the-are level on representative public datasets, including Moving MNIST, KTH, BAIR, and WeatherBench. The ablation and sensitivity studies verify the superiority of CAU. The learned causal maps precisely distinguish the pixel attributions and motion characteristics in sophisticated entangled scenarios.