InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: InfoSAM improves SAM's performance on specialized tasks by using an information-theoretic approach to distill and preserve domain-invariant knowledge during fine-tuning.
Abstract: The Segment Anything Model (SAM), a vision foundation model, exhibits impressive zero-shot capabilities in general tasks but struggles in specialized domains. Parameter-efficient fine-tuning (PEFT) is a promising approach to unleash the potential of SAM in novel scenarios. However, existing PEFT methods for SAM neglect the domain-invariant relations encoded in the pre-trained model. To bridge this gap, we propose InfoSAM, an information-theoretic approach that enhances SAM fine-tuning by distilling and preserving its pre-trained segmentation knowledge. Specifically, we formulate the knowledge transfer process as two novel mutual information-based objectives: (i) to compress the domain-invariant relation extracted from pre-trained SAM, excluding pseudo-invariant information as possible, and (ii) to maximize mutual information between the relational knowledge learned by the teacher (pre-trained SAM) and the student (fine-tuned model). The proposed InfoSAM establishes a robust distillation framework for PEFT of SAM. Extensive experiments across diverse benchmarks validate InfoSAM's effectiveness in improving SAM family's performance on real-world tasks, demonstrating its adaptability and superiority in handling specialized scenarios. The code and models are available at https://muyaoyuan.github.io/InfoSAM_Page.
Lay Summary: The pretrained AI models like the Segment Anything Model (SAM) promise to recognize anything in an image — but when moved to new domains, like medical or industrial settings, they often fall short. Why? Because these models weren’t trained for those situations. Instead of rebuilding the model from the ground up, researchers often fine-tune just a few parts. This is efficient, but it risks making the model forget the domain-invariant visual patterns it learned from large-scale pretraining. That’s where InfoSAM comes in. Think of it as a translator between the old model and the new task. Drawing from information theory, InfoSAM identifies and preserves the universal visual patterns that remain useful, while adapting only what truly needs to change. This approach makes SAM more effective in specialized settings, with minimal effort and greater accuracy. From natural-scene segmentation to healthcare, InfoSAM helps bring advanced pretrained AI tools into the real world, where reliable performance matters most.
Link To Code: https://muyaoyuan.github.io/InfoSAM_Page/
Primary Area: Applications->Computer Vision
Keywords: Segment Anything Model, Parameter-efficient fine-tuning, Mutual information, Knowledge distillation
Submission Number: 11749
Loading