Keywords: Interactive Segmentation, 3D Medical Image, SAM2
Abstract: The increasing volume of complex 3D biomedical imaging data highlights the need for accurate and efficient analysis methods. Segmentation of such data is essential for diagnosis, anatomical analysis, disease monitoring, and treatment planning. However, existing segmentation algorithms often struggle with the variability of object structures and the diversity of imaging modalities. To address these challenges, we introduce iMedSTAM, a promptable foundation model for 3D image and video segmentation. The model is also capable of progressively improving segmentation quality based on user interactions. iMedSTAM was developed by fine-tuning EfficientTAM on a large-scale dataset comprising over 300,000 3D image–mask pairs and 45,000 video–mask pairs, covering five different medical imaging modalities. In addition, we extend the EfficientTAM architecture with a bidirectional inference and memory mechanism that enables the processing of volumetric data. iMedSTAM significantly outperforms all previous models on the publicly available validation set in the coreset track and achieves state-of-the-art results in the all-data track. Our model reaches an average final DSC and NSD of 0.791 and 0.857, respectively. For DSC\_AUC and NSD\_AUC, which measure the cumulative improvement through additional user interactions, iMedSTAM achieves scores of 3.067 and 3.323.
Submission Number: 7
Loading