Keywords: Medical Image Segmentation, Foundation Model, VLM, SAM
Abstract: Text-prompted foundation models for medical image segmentation offer an intuitive
way to delineate anatomical structures from natural language queries, but
their predictions often lack spatial precision and degrade under domain shift.
In contrast, visual-prompted models achieve strong segmentation performance
across diverse modalities by leveraging spatial cues of precise bounding-box
(bbox) prompts to guide the segmentation of target lesions. However, it is costly
and challenging to obtain the precise visual prompts in clinical practice. We propose
PPBoost (Progressive Prompt-Boosting), a framework that bridges these limitations
by transforming weak text-derived signals into strong, spatially grounded
visual prompts, operating under a strict zero-shot regime with no image- or pixellevel
segmentation labels. PPBoost first uses vision-language model to produce
initial pseudo-bboxes conditioned on the textual object names and applies an
uncertainty-aware criterion to filter unreliable predictions. The retained imagebboxes
pairs are then leveraged to train a pseudo-labeled detector, producing the
high-quality bboxes for the query images. At inference, PPBoost further refines
the generated bboxes by appropriately expand them to tightly cover the target
anatomical structures. The enhanced spatially-grounding bbox prompts guide existing
segmentation models to generate final dense masks, effectively amplifying
weak text cues into strong spatial guidance. Across three datasets spanning diverse
modalities and anatomies, PPBoost consistently improves Dice and Normalized
Surface Distance over text- and visual-prompted baselines and, notably,
surpasses few-shot segmentation models without using labeled data. PPBoost can
generalize to multiple typical visual segmentation model backbones. The anonymous
code implementation is in: https://anonymous.4open.science/
r/submission-code-E2BB/.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 19298
Loading