Attention as Implicit Structural Inference

Published: 19 Jun 2023, Last Modified: 28 Jul 20231st SPIGM @ ICML PosterEveryoneRevisionsBibTeX
Keywords: Attention, Structural Inference, Variational Inference, Predictive Coding, Graphical Models
Abstract: Attention mechanisms play a crucial role in cognitive systems by allowing them to flexibly allocate cognitive resources. Transformers, in particular, have become a dominant architecture in machine learning, with attention as their central innovation. However, the underlying intuition and formalism of attention in Transformers is based on ideas of keys and queries in database management systems. In this work, we pursue a structural inference perspective, building upon, and bringing together, previous theoretical descriptions of attention such as; Gaussian Mixture Models, alignment mechanisms and Hopfield Networks. Specifically, we demonstrate that attention can be viewed as inference over an implicitly defined set of possible adjacency structures in a graphical model, revealing the generality of such a mechanism. This perspective unifies different attentional architectures in machine learning and suggests potential modifications and generalizations of attention. We hope by providing a new lens on attention architectures our work can guide the development of new and improved attentional mechanisms.
Submission Number: 85
Loading