Mastering SAM Prompts: A Large-Scale Empirical Study in Segmentation Refinement

TMLR Paper5353 Authors

10 Jul 2025 (modified: 21 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Segment Anything Model (SAM) has emerged as a prevalent tool empowering advances in vision tasks from instance segmentation, panoptic segmentation, to interactive segmentation. Leveraging powerful zero-shot capabilities enabled by visual prompts such as masks placed on the image, SAM has been shown to significantly improve tasks. Yet, a poor prompt can worsen SAM performance, risking consequences such as misdiagnoses, autonomous driving failures, or manufacturing defects. However, recent studies on visual SAM prompting remain limited, cover only a small fraction of potential prompt configurations, adopt ad-hoc evaluation strategies, and come with limited or even no rigorous analysis of the statistical significance of prompt configurations. To address this gap, we undertake the first large-scale empirical study comprehensively evaluating the impact of SAM prompt configurations on segmentation refinement. This includes 2,688 prompt configurations, including points, boxes, and masks with diverse augmentations, on four initial segmentation models for a total of 10,752 evaluations. From these results, we draw statistically significant insights along with practical guidelines for prompt design. In particular, we recommend including a bounding box, which raised AP@50-95 by 0.320 and advise against using a coarse mask, which lowers AP@50-95 by -0.133 across all four models. We showcase that our recommended prompt configuration enables SAM to outperform leading refinement methods on multiple benchmark datasets.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jianbo_Jiao2
Submission Number: 5353
Loading