Five Models for Five Modalities: Open-Vocabulary Segmentation in Medical Imaging

07 Jun 2025 (modified: 09 Jun 2025)CVPR 2025 Workshop MedSegFM SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: open vocabulary segmentation, CT, MRI
TL;DR: We adapt SAT for open-vocabulary segmentation using five modality-specific models with a shared text encoder. This improves generalization, with strong results on Ultrasound and Microscopy, while CT/MRI lag due to limited training.
Abstract: We present a multimodal approach to open-vocabulary segmentation in medical imaging by training five modality-specific models using a unified architecture based on the SAT model. Each model is tailored to a specific imaging modality—CT, MRI, Ultrasound, Microscopy, and PET, while maintaining architectural consistency to ensure comparability and generalizability. To address the challenge of limited data availability, particularly in modalities like Ultrasound and Microscopy, we implement distinct sampling strategies designed to maximize anatomical and pathological diversity across training cases. We aim to evaluate the effectiveness of open-vocabulary segmentation across diverse medical imaging modalities using consistent text prompts and unified label representations. For CT, MRI, and Ultrasound, performance is reported using Dice Similarity Coefficient (DSC) and Normalized Surface Dice (NSD), while for Microscopy and PET, we follow challenge-specific guidelines and report F1 scores. On the official validation set, the models achieved: CT (DSC: 0.3280, NSD: 0.3043), MRI (DSC: 0.2909, NSD: 0.3566), Ultrasound (DSC: 0.7656, NSD: 0.7485), Microscopy (F1: 0.3966), and PET (F1: 0.2906). These preliminary results demonstrate the viability of modality-specific training within an open-vocabulary framework and provide a foundation for further improvements.
Submission Number: 13
Loading