Keywords: Interactive Segmentation, Vision Transformers, Fundation Models, Medical imaging.
Abstract: Interactive medical segmentation reduces annotation effort by refining predictions through user feedback. Vision Transformer (ViT)-based models, such as the Segment Anything Model (SAM), achieve state-of-the-art performance using user clicks and prior masks as prompts. However, existing methods treat interactions as independent events, leading to redundant corrections and limited refinement gains. We address this by introducing MAIS, a Memory-Attention mechanism for Interactive Segmentation that stores past user inputs and segmentation states, enabling temporal context integration. Our approach enhances ViT-based segmentation across diverse imaging modalities, achieving more efficient and accurate refinements.
Primary Subject Area: Segmentation
Secondary Subject Area: Foundation Models
Paper Type: Methodological Development
Registration Requirement: Yes
Visa & Travel: Yes
Submission Number: 126
Loading