RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2

19 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: SAM 2, Automatic Prompt, Refinement, Medical Image Segmentation
Abstract: Segment Anything Model 2 (SAM 2) is a prompt-driven foundation model that extends SAM to both image and video domains, demonstrating superior zero-shot performance over its predecessor. While SAM 2 builds on SAM's success in medical image segmentation, it retains limitations such as binary mask outputs, lack of semantic label inference, and reliance on precise prompts for target object identification. Moreover, applying SAM and SAM 2 directly to medical image segmentation tasks often yields suboptimal results. In this paper, we investigate the upper performance limit of SAM 2 using custom fine-tuning adapters and ground-truth prompts, achieving a Dice Similarity Coefficient (DSC) of 92.30% on the BTCV dataset, surpassing the state-of-the-art nnUNet by 12%. To address prompt dependency, we explore multiple prompt generation strategies and introduce a UNet that autonomously predicts masks and bounding boxes, which are then used as input to SAM 2. Dual-stage refinements within SAM 2 further improve performance. Extensive experiments demonstrate that our method achieves state-of-the-art results on the AMOS2022 dataset, with a 1.4% Dice improvement over nnUNet, and outperforms nnUNet by 6.4% on the BTCV dataset.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 15545
Loading