EVA: Emergent Human-Like Visual Scanpaths in Hard Attention Models

ICLR 2026 Conference Submission503 Authors

01 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hard Attention, Scanpath Similarity, Reinforcement Learning, Eye Movements, Brain-Inspired Models
TL;DR: We proposed EVA, a brain-inspired hard attention model that learns only from class labels yet produces emergent human-like scanpaths in image classification task, achieving state-of-the-art accuracy and efficiency among hard attention models.
Abstract: Humans recognize images by actively sampling them through saccades and fixations. Hard attention models mimic this process but are typically judged only on accuracy. We introduce EVA, a brain-inspired hard-attention vision model designed to deliver strong classification performance while simultaneously producing human-aligned gaze patterns and interpretable internal dynamics. EVA operates with a small number of sequential glimpses, combining a human-inspired foveal-peripheral glimpse module, neuromodulator-based variance control, and a gating mechanism. On the image classification benchmark CIFAR-10, for which human gaze data is available, we show that EVA achieves a compelling trade-off between accuracy and scanpath similarity, comparable to efficient CNNs and other hard attention baselines. Crucially, we demonstrate that EVA’s learned fixation policy aligns with human scanpaths across multiple metrics (NSS, AUC). Further, its internal recurrent states yield class-specific trajectories in PCA space, revealing structured, interpretable processing dynamics. Ablation studies show that while the CNN backbone drives performance, the gating and neuromodulator modules uniquely enable alignment and interpretability. These results suggest that combining brain-inspired structural modules can yield vision models that are not only efficient and accurate but also transparent and human-aligned, a step toward jointly advancing performance and interpretability.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 503
Loading