MAREO: Memory- and Attention- based visual REasOning

Mohit Vaishnav, Thomas Serre

2022 (modified: 10 Nov 2022)CoRR 2022Readers: Everyone

Abstract: Humans continue to outperform modern AI systems in their ability to parse and understand complex visual scenes flexibly. Attention and memory are two systems known to play a critical role in our ability to selectively maintain and manipulate behaviorally-relevant visual information to solve some of the most challenging visual reasoning tasks. Here, we present a novel architecture for visual reasoning inspired by the cognitive-science literature on visual reasoning, the Memory- and Attention-based (visual) REasOning (MAREO) architecture. MAREO instantiates an active-vision theory, which posits that the brain solves complex visual reasoning problems compositionally by learning to combine previously-learned elementary visual operations to form more complex visual routines. MAREO learns to solve visual reasoning tasks via sequences of attention shifts to route and maintain task-relevant visual information into a memory bank via a multi-head transformer module. Visual routines are then deployed by a dedicated reasoning module trained to judge various relations between objects in the scenes. Experiments on tasks containing complex visual relations (SVRT challenge) and same-different differentiation, relation match to sample, Raven's and Identity rules from ART challenge demonstrate MAREO's ability to learn visual routines in a robust and sample-efficient manner. We also show the zero-shot generalization on unseen tasks and the compositionality nature of the architecture.

0 Replies