Abstract: Highlights•Efficient adaptation of the Segment Anything Model (SAM) for Audio-Visual Segmentation (AVS).•High-performance AVS with reduced input resolution and improved inference speed.•Boosting real-world AVS performance through synthetic data pretraining.
External IDs:doi:10.1016/j.cviu.2025.104460
Loading