Enhancing a 3D Foundation Model with Gaussian Sampling for Interactive Biomedical Image Segmentation

05 Jun 2025 (modified: 09 Jun 2025)CVPR 2025 Workshop MedSegFM SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interactive segmentation, 3D biomedical image analysis, Prompt sampling
TL;DR: Our method leverages Gaussian edge-center point sampling to improve interactive 3D medical image segmentation on the VISTA3D foundation model, demonstrating state-of-the-art performance on the CVPR 2025 challenge validation set.
Abstract: Interactive segmentation of 3D medical images seeks to produce accurate object masks with minimal user input, substantially alleviating the burden of manual annotation. For the CVPR 2025 Foundation Models for Interactive 3D Biomedical Image Segmentation Challenge, we extend the VISTA3D foundation model—a state-of-the-art 3D segmentation network supporting both automatic and interactive modes—by introducing several targeted improvements for robust interactive segmentation. First, we propose a Gaussian Edge-Center point sampling strategy, which leverages Gaussian-weighted randomness combined with center/edge distance transforms to preferentially sample points at object centers and boundaries. This yields more realistic and effective foreground/background click simulations during training. Second, we integrate this sampler into a two-stage fine-tuning pipeline: initial conventional fine-tuning with provided pre-trained weights, followed by prompt-focused fine-tuning using our improved sampling strategy. Third, to meet the challenge’s 90-second runtime limit, we optimize inference by dynamically adjusting the region of interest (ROI) size and resolution based on input voxel spacing, including adaptive downsampling and ROI cropping. We trained models for both tracks—using 4×A100 GPUs for the full dataset and 4×A800 GPUs for the 10\% core dataset—under identical protocols. On the validation set, our full-data model achieved a Dice Similarity Coefficient (DSC) Final of 0.7194, while the core-data model achieved 0.6782. These results demonstrate that our enhanced approach effectively leverages the capabilities of foundation models for interactive 3D segmentation, delivering accurate results with efficient user interaction.
Submission Number: 9
Loading