Jump Self-attention: Capturing High-order Statistics in Transformers

Haoyi Zhou; Siyang Xiao; Shanghang Zhang; Jieqi Peng; Shuai Zhang; Jianxin Li

Jump Self-attention: Capturing High-order Statistics in Transformers

Haoyi Zhou, Siyang Xiao, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li

Published: 31 Oct 2022, Last Modified: 09 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Neural Network, Transformer, Self-attention

Abstract: The recent success of Transformer has benefited many real-world applications, with its capability of building long dependency through pairwise dot-products. However, the strong assumption that elements are directly attentive to each other limits the performance of tasks with high-order dependencies such as natural language understanding and Image captioning. To solve such problems, we are the first to define the Jump Self-attention (JAT) to build Transformers. Inspired by the pieces moving of English Draughts, we introduce the spectral convolutional technique to calculate JAT on the dot-product feature map. This technique allows JAT's propagation in each self-attention head and is interchangeable with the canonical self-attention. We further develop the higher-order variants under the multi-hop assumption to increase the generality. Moreover, the proposed architecture is compatible with the pre-trained models. With extensive experiments, we empirically show that our methods significantly increase the performance on ten different tasks.

TL;DR: Jump Self-attention

Supplementary Material: zip

19 Replies

Loading