Learning to count visual objects by combining "what" and "where" in recurrent memoryDownload PDF

Published: 20 Oct 2022, Last Modified: 05 May 2023Gaze Meets ML 2022 PosterReaders: Everyone
Keywords: numerosity, counting, gaze, attention, RNN, dorsal stream
TL;DR: A recurrent glimpsing neural network model of visual counting integrates gaze contents and gaze location to generalize to several out-of-distribution test sets.
Abstract: Counting the number of objects in a visual scene is easy for humans but challenging for modern deep neural networks. Here we explore what makes this problem hard and study the neural computations that allow transfer of counting ability to new objects and contexts. Previous work has implicated posterior parietal cortex (PPC) in numerosity perception and in visual scene understanding more broadly. It has been proposed that action-related saccadic signals computed in PPC provide object-invariant information about the number and arrangement of scene elements, and may contribute to relational reasoning in visual displays. Here, we built a glimpsing recurrent neural network that combines gaze contents ("what") and gaze location ("where") to count the number of items in a visual array. The network successfully learns to count and generalizes to several out-of-distribution test sets, including images with novel items. Through ablations and comparison to control models, we establish the contribution of brain-inspired computational principles to this generalization ability. This work provides a proof-of-principle demonstration that a neural network that combines "what" and "where" can learn a generalizable concept of numerosity and points to a promising approach for other visual reasoning tasks.
Submission Type: Full Paper
Travel Award - Academic Status: Post-doc
Travel Award - Institution And Country: University of Oxford, UK
Travel Award - Low To Lower-middle Income Countries: No, my institution does not qualify.
Camera Ready Latexfile: zip
5 Replies