Mastering SAM Prompts: A Large-Scale Empirical Study in Segmentation Refinement for Scientific Imaging

Mastering SAM Prompts: A Large-Scale Empirical Study in Segmentation Refinement for Scientific Imaging

TMLR Paper5353 Authors

10 Jul 2025 (modified: 05 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Segment Anything Model (SAM) has emerged as a prevalent tool empowering advances in vision tasks from instance segmentation, panoptic segmentation, to interactive segmentation. Leveraging powerful zero-shot capabilities enabled by visual prompts such as masks placed on the image, SAM has been shown to significantly improve tasks. Yet, a poor prompt can worsen SAM performance, risking consequences such as misdiagnoses, autonomous driving failures, or manufacturing defects. However, recent studies on visual SAM prompting remain limited, cover only a small fraction of potential prompt configurations, adopt ad-hoc evaluation strategies, and come with limited or even no rigorous analysis of the statistical significance of prompt configurations. To address this gap, we undertake the first large-scale empirical study comprehensively evaluating the impact of SAM prompt configurations on segmentation refinement. This includes 2,688 prompt configurations, including points, boxes, and masks with diverse augmentations, on four initial segmentation models for a total of 10,752 evaluations. From these results, we draw statistically significant insights along with practical guidelines for prompt design \textcolor{orange}{on scientific images}. In particular, we recommend including a bounding box, which raised AP@50-95 by 0.320 and advise against using a coarse mask, which lowers AP@50-95 by -0.133 across all four models \textcolor{orange}{on microscopy data sets}. We showcase that our recommended prompt configuration enables SAM to outperform leading refinement methods on multiple \textcolor{orange}{scientific} benchmark datasets.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: In this revised version, we have addressed the recommended changes from Reviewers MMZu, GYZY, and f9h5. A brief summary is provided below, with a more in-depth point-by-point response for each individual reviewer, and can also be found in the appendix. Revised sections are color-coded by reviewer, as outlined below. - For reviewer MMZu, (with corresponding changes color-coded blue), we have added an ablation study of included components, conducted an evaluation on ECSSD (a non-microscopy dataset), included additional figures for improved readability, corrected visuals, and expanded citations. - For reviewer GYZY, (with corresponding changes color-coded violet), we have included an additional experiment evaluating the generalizability of our proposed prompt construction on SAM2, a newer architecture intended for videos, but still possessing image capabilities. - For reviewer f9h5, (with corresponding changes color-coded orange), we have refined the language to better highlight that this work was conducted on scientific datasets (including a slight change to the title, abstract, and introduction), expanded descriptions for algorithms evaluated, and provided more in-depth rationale for experimental parameters chosen.

Assigned Action Editor: ~Jianbo_Jiao2

Submission Number: 5353

Loading