Temporal Gaze Dynamics as Zero-Shot Prompts for Volumetric Medical Segmentation

Published: 23 Sept 2025, Last Modified: 01 Dec 2025TS4H NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: interactive segmentation, gaze-based interaction, foundation models, zero-shot learning, physiological time series, multimodal learning, volumetric medical imaging, human-in-the-loop ai, eye-tracking, medical ai
TL;DR: We use eye gaze as a zero-shot, time-series prompt to guide foundation models, making interactive 3D medical image segmentation significantly faster and more intuitive than manual methods.
Abstract: Guiding foundation models like SAM-2 for volumetric medical segmentation typically relies on inefficient manual prompts. We introduce a more efficient, multimodal approach using eye gaze—a continuous physiological time series—to steer the model's focus in a zero-shot manner. By fusing a user's temporal gaze stream with spatial image data, we enable dynamic, interactive 3D segmentation. Evaluating with SAM-2 and its medical variant, MedSAM-2, our gaze-based method proves significantly more time-efficient (e.g., 62 vs. 88 seconds per volume) than manual bounding boxes, with a modest accuracy trade-off. This work establishes a practical framework for incorporating human physiological signals into sequential, human-in-the-loop clinical tasks, paving the way for more intuitive AI interfaces.
Submission Number: 8
Loading