Abstract: Deep Neural Networks (DNNs) are susceptible to adversarial inputs, such as imperceptible noise and naturally occurring challenging samples. This vulnerability likely arises from their passive, one-shot processing approach. In contrast, neuroscience suggests that human vision robustly identifies salient object features by actively switching between multiple fixation points (saccades) and processing surroundings with non-uniform resolution (foveation). This information is processed via two pathways: the dorsal (where) and ventral (what) streams, which identify relevant input portions and discard irrelevant details. Building on this perspective, we outline a deep learning-based active dorsal-ventral vision system and adapt two prior methods, FALcon and GFNet, within this framework to evaluate their robustness. We conduct a comprehensive robustness analysis across three categories: adversarially crafted inputs evaluated under transfer attack scenarios, natural adversarial images, and foreground-distorted images. By learning from focused, downsampled glimpses at multiple distinct fixation points, these active methods significantly enhance the robustness of passive networks, achieving a 2-21 % increase in accuracy. This improvement is demonstrated against state-of-the-art transferable black-box attack. On ImageNet-A, a benchmark for naturally occurring hard samples, we show how distinct predictions from multiple fixation points yield performance gains of 1.5-2 times for both CNN and Transformer based networks. Lastly, we qualitatively demonstrate how an active vision system aligns more closely with human perception for structurally distorted images. This alignment leads to more stable and resilient predictions, with lesser catastrophic mispredictions. In contrast, passive methods, which rely on single-shot learning and inference, often lack the necessary structural understanding.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 3255
Loading