BiomedParse-V : Scaling Foundation Model for Universal Text-guided Volumetric Biomedical Image Segmentation

Theodore Zhao; Ho Hin Lee; Alberto Santamaria-Pang; Noel C Codella; Sid Kiblawi; Yu Gu; Yu Fang; Wenxuan Teng; Naiteek Sangani; Ivan Tarapov; Matthew P. Lungren; Matthias Blondeel; Tristan Naumann; Naoto Usuyama; Sheng Wang; Paul Vozila; Hoifung Poon; Mu Wei

BiomedParse-V : Scaling Foundation Model for Universal Text-guided Volumetric Biomedical Image Segmentation

Theodore Zhao, Ho Hin Lee, Alberto Santamaria-Pang, Noel C Codella, Sid Kiblawi, Yu Gu, Yu Fang, Wenxuan Teng, Naiteek Sangani, Ivan Tarapov, Matthew P. Lungren, Matthias Blondeel, Tristan Naumann, Naoto Usuyama, Sheng Wang, Paul Vozila, Hoifung Poon, Mu Wei

09 Aug 2025 (modified: 05 Nov 2025)CVPR 2025 Workshop MedSegFM SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Biomedical Image, Image Segmentation, Multimodal

Abstract: Three-dimensional (3D) image segmentation plays a pivotal role in clinical diagnosis, therapy planning, and drug discovery by enabling the precise delineation of anatomical structures, pathological lesions, and cellular features in medical imaging modalities such as CT and MRI, as well as in biomedical microscopy. Despite its central importance, 3D segmentation remains a formidable technical challenge due to high computational requirements, the vast diversity of segmentation tasks across clinical and research domains, and the lack of interoperability among existing models, which are typically developed for specific modalities and tasks. To address these limitations, we introduce BiomedParse-V, a scalable and generalizable multimodal foundation model that leverages pretrained 2D foundation models to enable accurate, text-prompted 3D image segmentation. Our method features a novel Fractal Volumetric Encoding (FVE) scheme, which hierarchically compresses volumetric data by capturing self-similarity across slices into a compact fractal-based 2.5D representation. This design allows the effective use of powerful 2D foundation models while preserving essential 3D spatial context. We further propose the Independent Segmentation Discriminator (ISD) module to promote robust and consistent object localization throughout the segmented volume, addressing the challenges of maintaining spatial coherence in text-guided segmentation. Extensive experiments conducted across diverse biomedical imaging modalities demonstrate thatBiomedParse-V consistently achieves state-of-the-art performance, significantly surpassing leading supervised 3D segmentation models. Our approach delivers a prompt-driven, computationally efficient, and broadly applicable solution for 3D biomedical image segmentation, advancing the accessibility and impact of segmentation technologies in real-world clinical and research environments.

Submission Number: 17

Loading