Associative Transformer Is A Sparse Representation Learner

Published: 27 Oct 2023, Last Modified: 26 Nov 2023AMHN23 OralEveryoneRevisionsBibTeX
Keywords: associative memory, working memory, global workspace theory, attention mechanism
TL;DR: We propose the Associative Transformer (AiT) building upon recent neuroscience studies of the Global Workspace Theory and associative memory.
Abstract: Emerging from the monolithic pairwise attention mechanism in conventional Transformer models, there is a growing interest in leveraging sparse interactions that align more closely with biological principles. Approaches including the Set Transformer and the Perceiver employ cross-attention consolidated with a latent space that forms an attention bottleneck with limited capacity. Building upon recent neuroscience studies of the Global Workspace Theory and associative memory, we propose the Associative Transformers (AiT). AiT induces low-rank explicit memory that serves as both priors to guide bottleneck attention in shared workspace and attractors within associative memory of a Hopfield network. We show that AiT is a sparse representation learner, learning distinct priors through the bottlenecks that are complexity-invariant to input quantities and dimensions. AiT demonstrates its superiority over methods such as the Set Transformer, Vision Transformer, and Coordination in various vision tasks.
Submission Number: 43
Loading