Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce an extension of the Transformer architecture with explicit relational computational mechanisms, integrating sensory and relational processing.
Abstract: Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: *sensory* information about the properties of individual objects, and *relational* information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the *Dual Attention Transformer (DAT)*, featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate *DAT* on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.
Lay Summary: Current artificial intelligence models are often good at recognizing objects (sensory information) but can struggle with reasoning about how objects relate or interact with each other (relational information). But relational reasoning is central to human intelligence, underpinning the ability to make analogies, comparisons, abstractions, and generalizations. This work develops a new kind of AI model, called the Dual Attention Transformer (DAT), that has enhanced relational reasoning abilities over the standard Transformer architecture. By processing both sensory and relational information, the model becomes more efficient and better at tasks requiring complex reasoning, from understanding language to interpreting visual scenes. This advancement could lead to AI systems that learn more like humans and can tackle more sophisticated problems, while requiring less data than current systems.
Link To Code: http://github.com/awni00/dual-attention
Primary Area: Deep Learning->Attention Mechanisms
Keywords: relational, reasoning, attention, transformers, inductive biases, sensory, relational, architecture, attention
Submission Number: 13994
Loading