Unlocking the Power of SAM 2 for Few-Shot Segmentation

Qianxiong Xu; Lanyun Zhu; Xuanyi Liu; Guosheng Lin; Cheng Long; Ziyue Li; Rui Zhao

Unlocking the Power of SAM 2 for Few-Shot Segmentation

Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2's video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones, i.e., they are likely to contain incomplete query FG, and some unexpected query background (BG) features, leading to wrong segmentation. Hence, we further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory. Extensive experiments have been conducted on PASCAL-5$^i$ and COCO-20$^i$ to validate the effectiveness of our design, e.g., the 1-shot mIoU can be 4.2\% better than the best baseline.

Lay Summary: Teaching computers to segment objects in images with minimal examples—a task called Few-Shot Segmentation (FSS)—is challenging because models often overfit to limited training data. While advanced tools like SAM 2 (Segment Anything Model 2) excel at identifying objects in videos, they struggle when applied to FSS, where each example might show a different object (e.g., segmenting rare animals after seeing just one example). To bridge this gap, we developed two innovations: (1) a Pseudo Prompt Generator that mimics how diverse objects might appear, avoiding mismatches between training and real-world examples, and (2) an Iterative Memory Refinement system that continuously improves the model’s “memory” of key object features while filtering out distracting background details. Testing on standard benchmarks showed our method achieves 4.2\% higher accuracy than existing techniques in single-example scenarios, paving the way for more reliable AI tools in medical imaging (e.g., detecting rare tumors) or robotics (e.g., adapting to unseen objects).

Link To Code: https://github.com/Sam1224/FSSAM

Primary Area: Applications->Computer Vision

Keywords: Few-Shot Segmentation, Segment Anything Model 2, Video, Intra-class Gap

Submission Number: 633

Loading