Echo-Attention: Attend Once and Get $N$ Attentions for Free

Anonymous

Echo-Attention: Attend Once and Get $N$ Attentions for Free

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: This paper proposes echo-attention layers, an efficient method for improving the expressiveness of the self-attention layers without incurring significant parameter or training time costs. The key idea is to iteratively refine the attentional activations via stateful repeated computation, i.e., we compute the activations once and get $N$ refinements (echo-attentions) at a relatively cheap cost. To this end, we introduce an update and state transition function that operates over these attentional activations. Via a set of extensive experiments, we show that this the proposed Echoformer model demonstrates widespread benefits across 21 datasets including language modeling, machine translation, language understanding and question answering.

0 Replies

Loading