SHERPA: Fine-tuning Segment Anything Models with Task-relevant Guidance

05 Sept 2025 (modified: 29 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Guided by the Small Model, Task-relevant Attention Maps, Generalization Ability
TL;DR: We propose a fine-tuning approach where a small SAM guides a large SAM, effectively reducing the loss of generalization ability while enhancing fine-tuning performance.
Abstract: Segment Anything Models (SAMs) often struggle with certain specialized tasks. A common approach is to fine-tune models with specific task labels, but this often leads to overfitting, introduces model bias and significantly degrades their generalization ability. To overcome these challenges, we propose SHERPA, a novel framework that leverages a smaller SAM to guide the fine-tuning of a larger SAM via task-relevant features. Specifically, we first leverage the Fisher Ratio Separation (FRS) module to separate high task-relevant features and preserve the ability of the large SAM to perform other general tasks. Then, the Guiding Feature Extraction (GFE) module is used to extract representative guiding features from the fine-tuned small SAMs. We leverage small SAMs tailored for specific tasks (including natural image segmentation, biomedical image segmentation, and video object segmentation) as guidance and then evaluate the SHERPA scheme to fine-tune larger SAM series models. Our experiments demonstrate that SHERPA enhances the retention of generalization ability across those diverse tasks, by up to 11.1\%, and improves specific task performance by up to 2.2\%.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2379
Loading