What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses
Keywords: eye movements, scanpaths, mechanistic modelling, benchmarking, cognitive, data-driven, behaviour
TL;DR: A systematic fixation-level comparison of a performance-optimized DNN scanpath model and a mechanistic cognitive reveals behaviourally relevant mechanisms that can be added to the mechanistic model to substantially improve performance
Abstract: Understanding how humans move their eyes to gather visual information is a central question in neuroscience, cognitive science, and vision research. While recent deep learning (DL) models achieve state-of-the-art performance in predicting human scanpaths, their underlying decision processes remain opaque. At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model—DeepGaze III—to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. By identifying individual fixations where DeepGaze III succeeds and SceneWalk fails, we isolate behaviorally meaningful discrepancies and use them to motivate targeted extensions of the mechanistic framework. These include time-dependent temperature scaling, saccadic momentum and an adaptive cardinal attention bias: Simple, interpretable additions that substantially boost predictive performance. With these extensions, SceneWalk’s explained variance on the MIT1003 dataset doubles from 35% to 70%, setting a new state of the art in mechanistic scanpath prediction. Our findings show how performance-optimized neural networks can serve as tools for cognitive model discovery, offering a new path toward interpretable and high-performing models of visual behavior. Our code is available at https://github.com/bethgelab/what-moves-the-eyes
Primary Area: Neuroscience and cognitive science (e.g., neural coding, brain-computer interfaces)
Submission Number: 13360
Loading