Bridging Human Vision and Deep Perception with a Saccade-Fixation ROI Prior for Medical Image Segmentation

Bridging Human Vision and Deep Perception with a Saccade-Fixation ROI Prior for Medical Image Segmentation

ICLR 2026 Conference Submission3791 Authors

10 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechine Learning, Medical Image Segmentation, Saccade-Fixation Strategy

Abstract: Automatic medical image segmentation converts subjective visual interpretation into objective, pixel-level quantitative indicators with high precision and repeatability, providing essential morphological evidence for early disease detection and surgical planning. However, current segmentation networks universally follow an "equal-pixel" paradigm: every spatial location consumes the same amount of parameters regardless of its semantic saliency. Consequently, a large portion of computational resources are expended on lesion-free regions, leading to unnecessary GPU and memory overhead, and increasing the risk of overlooking tiny pathological areas. Human vision solves this problem through an active saccade-fixation strategy by first performing a rapid, low-resolution saccade to localize suspicious regions, then applying high-resolution fixation only where necessary. Inspired by this mechanism, we propose SaccadeFixationNet (SF-Net), a medical image segmentation framework that integrates biologically motivated gaze behaviors into an end-to-end trainable U-shaped architecture. SF-Net consists of a Saccade–Fixation Encoder (SFE) that combines global saccadic scanning with fixation-driven feature refinement, a Fixation Connectivity Module (FCM) that generates a Gaze ROI Map by modeling inter-fixation relations, and a Gaze-MoE Decoder (GMD) that adaptively routes fixation-relevant tokens to high-capacity experts while assigning peripheral regions to lightweight experts. This design enables ROI-guided selective computation, closely mimicking the allocation of neural resources in human vision. Extensive experiments on four heterogeneous medical datasets demonstrate that our model achieves significant performance gains and substantially outperforms baselines.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 3791

Loading