iMedSTAM: Interactive Segmentation and Tracking Anything in 3D Medical Images and Videos

Tobias Friedetzki; Lorenz Haberzettl; Ricarda Buttmann; Frank Puppe; Adrian Krenzer

iMedSTAM: Interactive Segmentation and Tracking Anything in 3D Medical Images and Videos

Tobias Friedetzki, Lorenz Haberzettl, Ricarda Buttmann, Frank Puppe, Adrian Krenzer

05 Jun 2025 (modified: 03 Nov 2025)CVPR 2025 Workshop MedSegFM SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interactive Segmentation, 3D Medical Image, SAM2

Abstract: The increasing volume of complex 3D biomedical imaging data highlights the need for accurate and efficient analysis methods. Segmentation of such data is essential for diagnosis, anatomical analysis, disease monitoring, and treatment planning. However, existing segmentation algorithms often struggle with the variability of object structures and the diversity of imaging modalities. To address these challenges, we introduce iMedSTAM, a promptable foundation model for 3D image and video segmentation. The model is also capable of progressively improving segmentation quality based on user interactions. iMedSTAM was developed by fine-tuning EfficientTAM on a large-scale dataset comprising over 270,000 3D image–mask pairs and 4,000 video–mask pairs, covering five different medical imaging modalities. In addition, we extend the EfficientTAM architecture with a bidirectional inference and memory mechanism that enables the processing of volumetric data. iMedSTAM significantly outperforms all previous models on the publicly available validation set in the coreset track and achieves state-of-the-art results in the all-data track. On the test set, our model reaches an average final DSC and NSD of 0.805 and 0.842, respectively. For DSC\_AUC and NSD\_AUC, which measure the cumulative improvement through additional user interactions, iMedSTAM achieves scores of 3.129 and 3.258.

Submission Number: 7

Loading