EIT: Enhanced Interactive Transformer

Tong Zheng; Bei Li; Huiwen Bao; Yi Jing; Tong Xiao; JingBo Zhu

EIT: Enhanced Interactive Transformer

Tong Zheng, Bei Li, Huiwen Bao, Yi Jing, Tong Xiao, JingBo Zhu

16 Jun 2023 (modified: 01 Dec 2023)Submitted to EMNLP 2023EveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Machine Translation

Keywords: Transformer; Multi-head self-attention; Multi-view learning;

Abstract: Two principles: the \textit{complementary principle} and the \textit{consensus principle} are widely acknowledged in the literature of multi-view learning. However, current design of Multi-head self-attention, an instance of multi-view learning, prioritizes the complementarity while ignoring the consensus. To address this problem, We propose an enhanced multi-head self-attention (EMHA). First, to satisfy the \textit{complementary principle}, EMHA removes the one-to-one mapping constraint among queries and keys in multiple subspaces and allows each query to attend to multiple keys. On top of that, we develop a method to fully encourage consensus among heads by introducing two interaction models, namely Inner-Subspace Interaction and Cross-Subspace Interaction. Extensive experiments on a wide range of language tasks (e.g. machine translation, abstractive summarization and grammar correction, languages modeling), show its superiority, with a very modest increase in model size.

Submission Number: 2581

Loading