VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li

Published: 01 Jan 2024, Last Modified: 17 Jan 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Any-thing Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmen-tation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation for-mats for reference images, including point, box, scribble, and mask. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicabil-ity while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization abil-ity of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we con-ducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with mini-mal learnable parameters. Furthermore, VRP-SAM demon-strates strong generalization capabilities, allowing it to per-form segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at https://github.com/syp2ysy/VRP-SAM