Linear Transformers Are Secretly Fast Weight ProgrammersDownload PDFOpen Website

2021 (modified: 18 Aug 2021)ICML 2021Readers: Everyone
Abstract: We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early ’90s, where a slow neural net learns by gradient descent to program the fast weight...
0 Replies

Loading