Calibrating Promptable Concept Segmentation via Paraphrase Consistency
Keywords: Uncertainty Quantification, Calibration, Promptable Segmentation, Vision-Language Models, Test-Time Inference, Paraphrase Consistency
TL;DR: A training-free method to filter false positives in promptable segmentation models by verifying if uncertain detections persist when the original text prompt is paraphrased.
Abstract: Promptable segmentation models such as SAM 3 excel at localizing objects from short natural-language prompts, but the confidence scores they emit alongside each mask are reliable only at the extremes: on SA-Co Gold, the mid-range $[0.2, 0.6)$ concentrates most false positives, leaving the question of which uncertain detections to trust unresolved. We propose \textbf{Equivalent-Prompt Confirmation} (EPC), a training-free test-time procedure that cross-validates uncertain detections against semantically equivalent prompt rephrasings. EPC runs the segmentation model on a small set of precomputed paraphrases of the original prompt and keeps a mid-confidence detection only if at least one paraphrase independently re-localizes the same region. Genuine detections, anchored to real visual entities, typically persist across paraphrases (e.g., ``apricot`` confirmed by ``apricot fruit``); spurious activations, tied to specific prompt phrasings, typically do not. A probabilistic view additionally casts the procedure as a Monte Carlo hypothesis test over a latent paraphrase distribution, connecting it to self-consistency methods in large language models. On the official SA-Co Gold benchmark, EPC improves calibration (expected calibration error $0.140 \to 0.114$), removes 35\% of false positives, and gains 8.1\,pp in precision, 7.2\,pp in average IoU, and 2.1\,pp in mask-level F1 over vanilla SAM~3, without gradient updates or model modification.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 66
Loading