Abstract: Interactive segmentation tools are necessary to achieve the desired segmentation accuracy for complex target structures, such as vessels in medical images. But existing interactive methods–including those pre-trained on large internet-scale datasets–offer limited mechanisms for users to provide prompts that effectively control segmentation outcomes. In particular, one-at-a-time point or text prompts are often insufficient for correcting errors in vascular segmentation masks. To address these limitations, we propose a novel interactive medical image segmentation method tailored for complex vascular structures. Our approach learns to interpret sequences of multimodal prompts–combining both text and point inputs. By enabling dual mode prompting, the method allows users to add semantic meaning to point-based interactions. Furthermore, by learning from aggregated sequences of prompts, the method captures inter-prompt relationships, enhancing its understanding and response to user input. Quantitative evaluations on six vascular datasets demonstrate that our method outperforms existing approaches. Additionally, it avoids critical failure cases and consistently generates improved segmentation masks across diverse imaging modalities and vascular anatomies.
External IDs:dblp:conf/miccai/LimL25
Loading