Memory-Based Sequential Attention

Published: 27 Oct 2023, Last Modified: 19 Nov 2023Gaze Meets ML 2023 OralEveryoneRevisionsBibTeX
Submission Type: Full Paper
Keywords: sequential attention, transformers, interpretability, reinforcement learning
TL;DR: We introduce a neural model of sequential attention that scans over an image and combines information, over a subset of image locations, into memory so that it can be contextualized and used for classification.
Abstract: Computational models of sequential attention often use recurrent neural networks, which may lead to information loss over accumulated glimpses and an inability to dynamically reweigh glimpses at each step. Addressing the former limitation should result in greater performance, while addressing the latter should enable greater interpretability. In this work, we propose a biologically-inspired model of sequential attention for image classification. Specifically, our algorithm contextualizes the history of observed locations from within an image to inform future gaze points, akin to scanpaths in the biological visual system. We achieve this by using a transformer-based memory module coupled with a reinforcement learning-based learning algorithm, improving both task performance and model interpretability. In addition to empirically evaluating our approach on classical vision tasks, we demonstrate the robustness of our algorithm to different initial locations in the image and provide interpretations of sampled locations from within the trajectory.
Supplementary Material: zip
Submission Number: 7
Loading