EIT: Enhanced Interactive Transformer

16 Jun 2023 (modified: 01 Dec 2023)Submitted to EMNLP 2023EveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Machine Translation
Keywords: Transformer; Multi-head self-attention; Multi-view learning;
Abstract: Two principles: the \textit{complementary principle} and the \textit{consensus principle} are widely acknowledged in the literature of multi-view learning. However, current design of Multi-head self-attention, an instance of multi-view learning, prioritizes the complementarity while ignoring the consensus. To address this problem, We propose an enhanced multi-head self-attention (EMHA). First, to satisfy the \textit{complementary principle}, EMHA removes the one-to-one mapping constraint among queries and keys in multiple subspaces and allows each query to attend to multiple keys. On top of that, we develop a method to fully encourage consensus among heads by introducing two interaction models, namely Inner-Subspace Interaction and Cross-Subspace Interaction. Extensive experiments on a wide range of language tasks (e.g. machine translation, abstractive summarization and grammar correction, languages modeling), show its superiority, with a very modest increase in model size.
Submission Number: 2581
Loading