Keywords: Multi-Modality Learning, Lightweight Architectures, Prompt-based Segmentation, 3D Medical Image Segmentation
TL;DR: Our method combines efficient MobileNet encoders with prompt generation from image intensities to enable fast, annotation-sparse segmentation across diverse 3D medical modalities.
Abstract: Interactive 3D medical image segmentation methods typically require manual bounding box prompts, limiting their applicability in automated workflows. In this work, we propose an intensity-based thresholding strategy that automatically generates bounding box prompts when explicit annotations are unavailable. Our method leverages statistical properties of medical images to identify regions of interest through adaptive thresholding, morphological operations, and connected component analysis. Experiments on the CVPR BiomedSegFM dataset demonstrate that this automated prompting strategy significantly improves segmentation performance in high-contrast modalities, achieving 0.73 DSC for CT and 0.74 DSC for PET, compared to 0.68 and 0.59 respectively with standard prompts. However, the method faces challenges in low-contrast modalities such as Ultrasound, where performance decreases from 0.68 to 0.31 DSC due to speckle noise and ambiguous tissue boundaries. We also report preliminary experiments with lightweight MobileNet encoders as alternatives to Vision Transformers, finding that current lightweight architectures suffer substantial accuracy degradation in 3D medical segmentation tasks. Our results highlight both the promise and limitations of automated prompt generation for multi-modality medical imaging. Code: https://github.com/lexorcvpr/lexor-cvpr-2025
Submission Number: 5
Loading