2021 (modified: 18 Aug 2021)ICML 2021Readers: Everyone
Abstract:We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early ’90s, where a slow neural net learns by gradient descent to program the fast weight...