PPBOOST: PROGRESSIVE PROMPT BOOSTING FOR TEXT-DRIVEN MEDICAL IMAGE SEGMENTATION

ICLR 2026 Conference Submission19298 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical Image Segmentation, Foundation Model, VLM, SAM
Abstract: Text-prompted foundation models for medical image segmentation offer an intuitive way to delineate anatomical structures from natural language queries, but their predictions often lack spatial precision and degrade under domain shift. In contrast, visual-prompted models achieve strong segmentation performance across diverse modalities by leveraging spatial cues of precise bounding-box (bbox) prompts to guide the segmentation of target lesions. However, it is costly and challenging to obtain the precise visual prompts in clinical practice. We propose PPBoost (Progressive Prompt-Boosting), a framework that bridges these limitations by transforming weak text-derived signals into strong, spatially grounded visual prompts, operating under a strict zero-shot regime with no image- or pixellevel segmentation labels. PPBoost first uses vision-language model to produce initial pseudo-bboxes conditioned on the textual object names and applies an uncertainty-aware criterion to filter unreliable predictions. The retained imagebboxes pairs are then leveraged to train a pseudo-labeled detector, producing the high-quality bboxes for the query images. At inference, PPBoost further refines the generated bboxes by appropriately expand them to tightly cover the target anatomical structures. The enhanced spatially-grounding bbox prompts guide existing segmentation models to generate final dense masks, effectively amplifying weak text cues into strong spatial guidance. Across three datasets spanning diverse modalities and anatomies, PPBoost consistently improves Dice and Normalized Surface Distance over text- and visual-prompted baselines and, notably, surpasses few-shot segmentation models without using labeled data. PPBoost can generalize to multiple typical visual segmentation model backbones. The anonymous code implementation is in: https://anonymous.4open.science/ r/submission-code-E2BB/.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 19298
Loading