Looking at the Performer from a Hopfield Point of View

Anonymous

Published: 28 Mar 2022, Last Modified: 05 May 2023BT@ICLR2022Readers: Everyone
Keywords: deep learning, hopfield networks, associative memory, attention, transformer
Abstract: The recent paper Rethinking Attention with Performers constructs a new efficient attention mechanism in an elegant way. It strongly reduces the computational cost for long sequences, while keeping the intriguing properties of the original attention mechanism. In doing so, Performers have a complexity only linear in the input length, in contrast to the quadratic complexity of standard Transformers. This is a major breakthrough in the strive of improving Transformer models. In this blog post, we look at the Performer from a Hopfield Network point of view and relate aspects of the Performer architecture to findings in the field of associative memories and Hopfield Networks. This blog post sheds light on the Performer from three different directions: (i) Performers resemble classical Hopfield Networks, (ii) Sparseness increases memory capacity, and (iii) Performer normalization relates to the activation function of continuous Hopfield Networks.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://arxiv.org/abs/2009.14794, https://openreview.net/forum?id=Ua6zuk0WRH
3 Replies

Loading