Looking at the Performer from a Hopfield Point of View
Keywords: deep learning, hopfield networks, associative memory, attention, transformer
Abstract: The recent paper Rethinking Attention with Performers constructs a new efficient attention mechanism in an elegant way. It strongly reduces the computational cost for long sequences, while keeping the intriguing properties of the original attention mechanism. In doing so, Performers have a complexity only linear in the input length, in contrast to the quadratic complexity of standard Transformers. This is a major breakthrough in the strive of improving Transformer models. In this blog post, we look at the Performer from a Hopfield Network point of view and relate aspects of the Performer architecture to findings in the field of associative memories and Hopfield Networks. This blog post sheds light on the Performer from three different directions: (i) Performers resemble classical Hopfield Networks, (ii) Sparseness increases memory capacity, and (iii) Performer normalization relates to the activation function of continuous Hopfield Networks.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://arxiv.org/abs/2009.14794, https://openreview.net/forum?id=Ua6zuk0WRH
3 Replies
Loading