SAVE: Segment Audio-Visual Easy way using the Segment Anything Model

Khanh-Binh Nguyen, Chae Jung Park

Published: 01 Oct 2025, Last Modified: 17 Nov 2025Computer Vision and Image UnderstandingEveryoneRevisionsCC BY-SA 4.0

Abstract: Highlights•Efficient adaptation of the Segment Anything Model (SAM) for Audio-Visual Segmentation (AVS).•High-performance AVS with reduced input resolution and improved inference speed.•Boosting real-world AVS performance through synthetic data pretraining.

External IDs:doi:10.1016/j.cviu.2025.104460