Linearized Relative Positional Encoding

Zhen Qin; Weixuan Sun; Kaiyue Lu; Hui Deng; Dongxu Li; Xiaodong Han; Yuchao Dai; Lingpeng Kong; Yiran Zhong

Linearized Relative Positional Encoding

Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Published: 20 Sept 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We have made the revisions to the paper as requested, and all changes have been marked in blue. Specifically, the following modifications have been implemented: - The experiments in Tables 1 to 4 have been repeated five times, and the mean and standard deviation are reported. - Experimental results for the "NoPE" have been included in the ALM experiments. - We have addressed the typos and issues pointed out by the reviewers.

Code: https://github.com/OpenNLPLab/Lrpe

Assigned Action Editor: ~Alessandro_Sordoni1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 996

Loading