An Attention Free TransformerDownload PDF

Anonymous

28 Sep 2020 (modified: 02 Oct 2020)ICLR 2021 Conference Blind SubmissionReaders: Everyone
  • Keywords: Transformers, attention, efficient
  • Abstract: We introduce Attention Free Transformer (AFT), an efficient variant of Transformers \citep{transformer} that eliminates the need for spatial attention. AFT offers great simplicity compared with standard Transformers, where the multi-head attention operation is replaced with the composition of element-wise multiplications/divisions and global/local pooling. We provide several variants of AFT along with simple yet efficient implementations that are supported by main stream deep learning libraries. We show that, surprisingly, we are able to train AFT effectively on challenging benchmarks, and also to match or surpass the standard Transformer counterparts.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
  • One-sentence Summary: We propose an efficient Transformer that eliminates attention.
0 Replies

Loading